-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to scrape a large body of tweets (e.g. a million)? #261
Comments
hey @aliiabbasi I am currently trying to scrape tweets containing the word "coronavirus" for sentiment analysis purposes and I can explain my workaround, maybe it could give you some advices :) I tried to download every tweets since January but it didn't work well... 24h wasn't enough and maybe splitting the request for each day is the great way to do it but I don't want to wait approximately 2 days (the full request for one day lasted 1h!). So what I am doing now is scraping each day 1000 "top tweets" What I'm observing is :
example :
I'm still trying to find ways to analyze covid-19 related tweets though, if you find better ways to scrape a lot of tweets don't hesitate to comment |
How is the topTweets criteria calculated? Is it a direct sort of retweets or a combination of retweets, favourites and replies? |
@anshumanchak The topTweets criteria is determined by twitter. You can enable twitter to show you the tweets he consider more relevant, normally are those having high engagement with the network. |
Thanks @Victorpc98 |
Try to to similar things. For 1000 tweets, it took me 2min 35 seconds. So imaging 1 million tweets...
|
I'm currently trying to scrape tweets for sentiment analysis. I'm also using the setNear("County in Ireland") and setQuerySearch("Covid19" or "covid" or "corona" or ......), but not getting many tweets. Is there a way to scrape more tweets? |
Hi @SRaina11 When I did this project I had the same issue. If I remember well, you can query tweets from short time periods and then have more tweets. For example, instead of getting the tweets from the last 6 month, you can divide it into 6x4 queries corresponding to the tweets each week, or into 6x4x7 queries corresponding to the tweets each day. Sometimes it didnt work well and it stopped working after some queries, you can maybe wait before each query so the API does not stop your queries. I hope it'll help you! |
Thank you for your response @julienbeisel. Would you know if we can use OR operator and try to find tweets for multiple keywords. For eg: tweetCriteria.setQuerySearch("Covid19" or "covid" or "corona")? Cheers |
I wanted to know what exactly the criteria are for the extracted Toptweets? Is it the likes or the number of retweets?
And also, is it possible to extract a large body of tweets (e.g. a million tweets) using this library?
I tried it for almost 800K tweets, but it neither respond nor it did error!
The text was updated successfully, but these errors were encountered: