Reddit采集
主要步骤:
- 创建reddit app账号, app类型选择
script
- 使用app账号登陆reddit获取授权
- 授权后,采集reddit 信息
如果仅测试,建议使用专门的测试账号。
有用的链接地址
创建reddit APP 账号
登陆reddit app 后,进入reddit app列表地址,创建一个reddit app。本例中创建的APP类型是脚本(script)类型。
reddit API使用规范
为了防止开发账号被封,需要遵守一定的规范,详细的规范地址:https://github.com/reddit-archive/reddit/wiki/API。
主要关注:
- 每分钟60个请求
- User-Agent规范(
不可以User-Agent欺骗):<platform>:<app ID>:<version string> (by /u/<reddit username>)
。例如android:com.example.myredditapp:v1.2.3 (by /u/kemitche)
reddit 获取tiktok 相关内容例子
# coding: UTF-8
#!/use/bin/env python3
import praw
import pandas as pd
import datetime as dt
reddit = praw.Reddit(
client_id='your-clientID',
client_secret='your secret',
user_agent='your_platform:dev_tmp:v0.1 (by /u/dev_tmp)',
# redirect_uri='http://localhost:8080'
username='dev_tmp',
password='HGFhgf123'
)
# print(reddit.auth.url(["identity"], "...", "permanent"))
print(reddit.user.me())
# all 是<class 'praw.models.reddit.subreddit.Subreddit'>类型,
# 具体使用见:https://praw.readthedocs.io/en/latest/code_overview/models/subreddit.html
all = reddit.subreddit("all")
print(type(all))
# submission 的类型是<class 'praw.models.reddit.submission.Submission'>,
# 具体属性列表见:https://praw.readthedocs.io/en/latest/code_overview/models/submission.html
messages = {
"id": [],
"url": [],
"title": [],
"score": [],
"comms_num": [],
"body": [],
"created": []
}
for submission in all.search("tiktok", limit=5):
messages["id"].append(submission.id)
messages["url"].append(submission.url)
messages["title"].append(submission.title)
messages["score"].append(submission.score)
messages["comms_num"].append(submission.num_comments)
messages["body"].append(submission.selftext)
messages["created"].append(submission.created)
# search 结果类型是<class 'praw.models.listing.generator.ListingGenerator'>
# praw 中的其他类型文档为:https://praw.readthedocs.io/en/latest/code_overview/other.html
# res = all.search("tiktok")
# print(type(res))
data = pd.DataFrame(messages)
data.to_csv('data.csv', index=False)
结果CSV: