TwitterWebsiteSearch is a python script for searching and saving data from Twitter.com search without using the official Twitter API. This allows bypassing some of the limitations of the official twitter API
- Get tweets older than 7 days.
Tweets extracted, are formatted similarly to the official API, detailed here
each tweet is a python dict with the following structure.
{
'created_at' : UTC-datetime format '%Y-%m-%d %H:%M:%S' ,
'id_str' : "",
'text' : "",
'entities': {
'hashtags': [],
'symbols':[],
'user_mentions':[],
'urls':[],
'media'[] optional
},
'user' : {
'id_str' : "",
'name' : "",
'screen_name': "",
'profile_image_url': "",
'verified': bool
},
'retweet_count' : 0,
'favorite_count' : 0,
'lang' : None
'is_quote_status' : False,
'quoted_status_id_str' : "" optional
'quoted_status' : {} optional
'in_reply_to_user_id': None,
'in_reply_to_screen_name' : None,
'contains_photo': False,
'contains_video': False
}
note: pass the query without url encoding.
from TwitterWebsiteSearch import TwitterClient, SearchQuery
client = TwitterClient()
query = SearchQuery('#python')
count = 0
for page in client.get_search_iterator(query):
for tweet in page['tweets']:
print("{0} id: {1} text: {2}".format(count, tweet['id_str'], tweet['text']))
count += 1
Useful resources for creating search queries. http://www.followthehashtag.com/help/hidden-twitter-search-operators-extra-power-followthehashtag/ https://twitter.com/search-advanced https://dev.twitter.com/rest/public/search
- -filter:nativeretweets
- -filter:replies
- -filter:links