404 HTTP status code is not handled or not allowed #92

Sebastokratos42 · 2020-09-18T07:13:05Z

Hello!

I used Tweetscraper without any problems yesterday, but today always the following issue emerges:

2020-09-18 08:59:36 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <404 https://twitter.com/i/search/timeline?l=&f=tweets&q=%23dfl%20since%3A2020-09-16&src=typed&max_position=>: HTTP status code is not handled or not allowed

I already changed my IP-Adress via VPN and I changed the user agent, but the issue remains. How am I able to solve this problem? Did twitter change the search url? Trying the url "https://twitter.com/i/search/timeline?l=&f=tweets&q=%23dfl%20since%3A2020-09-16&src=typed&max_position" leads to a 404 response. Thanks for your help!

Buaasinong · 2020-09-18T07:47:43Z

me too

imfht · 2020-09-18T07:57:42Z

me too.

yangyangdotcom · 2020-09-18T18:07:35Z

same here

adriprmn · 2020-09-19T05:00:24Z

same here.

adriprmn · 2020-09-20T11:23:13Z

any luck?

kujbika · 2020-09-20T16:26:15Z

I also face this issue.

Spaskich · 2020-09-21T10:15:00Z

When trying to access https://twitter.com/search-home, it redirects to https://twitter.com/explore. I think Twitter have removed this functionality altogether.

ASIAMI · 2020-09-21T11:17:37Z

We've got https://twitter.com/search-advanced?lang=en-GB. I try to change code but I am new here, so...
got error ValueError: Expecting value: line 1 column 1 (char 0) - It's in decode line. I think my item are null - must to work with Xpath... _> after change of URL when I inspect respone I've got result on my Local/Tempt so it's works. Wors with items...

irwanOyong · 2020-09-22T09:59:12Z

Twitter changed a little bit so we need to alter the url in TweetScraper class into self.url = "https://twitter.com/search/?lang={}".format(lang)

But unfortunately there is still this error @ASIAMI mentioned above,
File "/home/airone/external-git/TweetScraper/TweetScraper/spiders/TweetCrawler.py", line 44, in parse_page data = json.loads(response.body.decode("utf-8")) File "/usr/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
meaning that they might have also changed the structured a little bit, any workaround?

ASIAMI · 2020-09-22T10:10:15Z

Working on it, but still have same error. Must change Xpath - I think it is easy to repair but need someone with experience on it

shiwenThu · 2020-09-23T12:50:07Z

I have some experience on Xpath but I am not familiar with how the url should be changed. I have some time to work on it tomorrow. Can you share me with your progress and problems? | | shiwensmile | | [email protected] | 签名由网易邮箱大师定制 On 09/22/2020 18:10，ASIAMI<[email protected]> wrote： Working on it, but still have same error. Must change Xpath - I think it is easy to repair but need someone with experience on it — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

shiwenThu · 2020-09-23T12:51:55Z

Working on it, but still have same error. Must change Xpath - I think it is easy to repair but need someone with experience on it

I have some experience on Xpath but I am not familiar with how the url should be changed.
I have some time to work on it tomorrow. Can you share me with your progress and problems?

shiwenThu · 2020-09-23T13:06:21Z

Twitter changed a little bit so we need to alter the url in TweetScraper class into self.url = "https://twitter.com/search/?lang={}".format(lang)

But unfortunately there is still this error @ASIAMI mentioned above,
File "/home/airone/external-git/TweetScraper/TweetScraper/spiders/TweetCrawler.py", line 44, in parse_page data = json.loads(response.body.decode("utf-8")) File "/usr/lib/python3.8/json/__init__.py", line 357, in loads return _default_decoder.decode(s) File "/usr/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
meaning that they might have also changed the structured a little bit, any workaround?

I agree that the url should be changed, but I do not think "https://twitter.com/search/?lang={}".format(lang)` is the right answer, although it's the url displayed when users search the tweet with browser. The right url should response with data containing 'min_position'.
In fact, I am curious about how @jonbakerfish discovered the old url, which may help us deal with current situation.

jonbakerfish · 2020-09-24T01:42:25Z

Hi guys, the way I find out the URL for searching is using Chrome's DevTools to monitor the network activity.

Twitter changed the URL for searching and the format of returned data. So you need to modify the code accordingly. Here I give you an example and PRs are welcome.

When you search (e.g., abc) , you will find that a URL is issued:

https://api.twitter.com/2/search/adaptive.json?
include_profile_interstitial_type=1
&include_blocking=1
&include_blocked_by=1
&include_followed_by=1
&include_want_retweets=1
&include_mute_edge=1
&include_can_dm=1
&include_can_media_tag=1
&skip_status=1
&cards_platform=Web-12
&include_cards=1
&include_ext_alt_text=true
&include_quote_count=true
&include_reply_count=1
&tweet_mode=extended
&include_entities=true
&include_user_entities=true
&include_ext_media_color=true
&include_ext_media_availability=true
&send_error_codes=true
&simple_quoted_tweet=true
&q=abc
&count=20
&query_source=typed_query
&pc=1
&spelling_corrections=1
&ext=mediaStats%2ChighlightedLabel

When you sroll down to the bottom of the page, the same URL is issued again but with a cursor this time, which is used for loading more tweets from the severs:

https://api.twitter.com/2/search/adaptive.json?
include_profile_interstitial_type=1
&include_blocking=1
&include_blocked_by=1
&include_followed_by=1
&include_want_retweets=1
&include_mute_edge=1
&include_can_dm=1
&include_can_media_tag=1
&skip_status=1
&cards_platform=Web-12
&include_cards=1
&include_ext_alt_text=true
&include_quote_count=true
&include_reply_count=1
&tweet_mode=extended
&include_entities=true
&include_user_entities=true
&include_ext_media_color=true
&include_ext_media_availability=true
&send_error_codes=true
&simple_quoted_tweet=true
&q=abc
&count=20
&query_source=typed_query
&cursor=scroll%3AthGAVUV0VFVBYBFoDA5vOhzpWqJBIY1AISY8LrAAAB9D-AYk3S8an8AAAAKBIqQzEPVYAAEioifQPWoAISKjCkGxfQABIqRohJVgACEipDLymXcAISKi4vkpfQAxIp9qV-V9ADEioDygCXsAASKi_YX1ewABIqQ24Yl3AAEinrgDJXYAASKgwYq9eQAhIp-fq_FpAHEiomnoUWoAISKiYu2dawAhIqQazWl4ABEioVvr6WsAISKgbowZdwABIqRmtB12AFEio2DxKXcAYSKj_m_ReAARIqP3OwV4ABEiozbLXXgAASKkSyQZfQARIqRLolVrAAEioTWJoXkAISKggPHNaQBhIqFdDFlpAAEioRojMXcAgSKjdOLpawBhIqAajZV7ADEinyOp2XsAASKeST6xYAAxIqRLVBl5ABEipGHaaWoAASKkPnk9eQARIqEoqcF4ACEipGeS3XgAASKh1fvhawARIqRd1117ABJQAVACUAERW4gnoVgIl6GARVU0VSFQAVABVQFQIVAAA%3D
&pc=1
&spelling_corrections=1
&ext=mediaStats%2ChighlightedLabel

The response of the URL is like this, where you can find all the tweets and the cursor for the next request:

{globalObjects: {tweets: {,…},…}, timeline: {id: "search-6714704651894409473",…}}
globalObjects: {tweets: {,…},…}
broadcasts: {}
cards: {}
lists: {}
media: {}
moments: {}
places: {}
topics: {}
tweets: {,…}
    1308836102560768000: {created_at: "Wed Sep 23 18:30:18 +0000 2020", id: 1308836102560768000, id_str: "1308836102560768000",…}
    1308843500293828608: {created_at: "Wed Sep 23 18:59:42 +0000 2020", id: 1308843500293828600, id_str: "1308843500293828608",…}
    1308848357377560579: {created_at: "Wed Sep 23 19:19:00 +0000 2020", id: 1308848357377560600, id_str: "1308848357377560579",…}
    1308852022070906887: {created_at: "Wed Sep 23 19:33:34 +0000 2020", id: 1308852022070907000, id_str: "1308852022070906887",…}
    1308862807832768512: {created_at: "Wed Sep 23 20:16:25 +0000 2020", id: 1308862807832768500, id_str: "1308862807832768512",…}
    1308866238454657024: {created_at: "Wed Sep 23 20:30:03 +0000 2020", id: 1308866238454657000, id_str: "1308866238454657024",…}
    1308877845746331654: {created_at: "Wed Sep 23 21:16:10 +0000 2020", id: 1308877845746331600, id_str: "1308877845746331654",…}
    1308878030044098568: {created_at: "Wed Sep 23 21:16:54 +0000 2020", id: 1308878030044098600, id_str: "1308878030044098568",…}
    1308879028238123010: {created_at: "Wed Sep 23 21:20:52 +0000 2020", id: 1308879028238123000, id_str: "1308879028238123010",…}
    1308882628116910080: {created_at: "Wed Sep 23 21:35:11 +0000 2020", id: 1308882628116910000, id_str: "1308882628116910080",…}
    1308896562035204098: {created_at: "Wed Sep 23 22:30:33 +0000 2020", id: 1308896562035204000, id_str: "1308896562035204098",…}
    1308911248063574016: {created_at: "Wed Sep 23 23:28:54 +0000 2020", id: 1308911248063574000, id_str: "1308911248063574016",…}
    1308928407816863745: {created_at: "Thu Sep 24 00:37:05 +0000 2020", id: 1308928407816863700, id_str: "1308928407816863745",…}
    1308928903025754113: {created_at: "Thu Sep 24 00:39:03 +0000 2020", id: 1308928903025754000, id_str: "1308928903025754113",…}
    1308930852294983681: {created_at: "Thu Sep 24 00:46:48 +0000 2020", id: 1308930852294983700, id_str: "1308930852294983681",…}
    1308935459171708929: {created_at: "Thu Sep 24 01:05:06 +0000 2020", id: 1308935459171709000, id_str: "1308935459171708929",…}
    1308936068184629253: {created_at: "Thu Sep 24 01:07:32 +0000 2020", id: 1308936068184629200, id_str: "1308936068184629253",…}
    1308937253486579712: {created_at: "Thu Sep 24 01:12:14 +0000 2020", id: 1308937253486579700, id_str: "1308937253486579712",…}
    1308938288925937664: {created_at: "Thu Sep 24 01:16:21 +0000 2020", id: 1308938288925937700, id_str: "1308938288925937664",…}
    1308938422069977089: {created_at: "Thu Sep 24 01:16:53 +0000 2020", id: 1308938422069977000, id_str: "1308938422069977089",…}
users: {3135241: {id: 3135241, id_str: "3135241", name: "RedState", screen_name: "RedState",…},…}
timeline: {id: "search-6714704651894409473",…}

I'm currently working on some other projects and may update the code later. Hope this can help and PRs are welcome.

ASIAMI · 2020-09-24T12:29:47Z

OK, but with this new site - got my response , still -> should I use Bs4 as response is not in JSON
according to information from scrapy web?

Handling different response formats

Once you have a response with the desired data, how you extract the desired data from it depends on the type of response:

If the response is HTML or XML, use selectors as usual.

If the response is JSON, use json.loads() to load the desired data from response.text:

irwanOyong · 2020-09-26T11:14:10Z

Anyone done with the update?

Tried it myself but haven't reached successful one.

ASIAMI · 2020-09-26T12:15:35Z

not yet, I've tried with bs4 and got something but still working on it

ASIAMI · 2020-09-26T16:51:00Z

@jonbakerfish Can you help me, I can see everything you wrote about. Ale url api.twitter is not allowed. How to get into these data

AndyAsare · 2020-09-27T05:48:57Z

Same problem here!

tmantynen mentioned this issue Sep 18, 2020

CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0) twintproject/twint#915

Open

akshaykalucha mentioned this issue Sep 25, 2020

consistent sleeping for a long time and no response twintproject/twint#926

Closed

jonbakerfish closed this as completed Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

404 HTTP status code is not handled or not allowed #92

404 HTTP status code is not handled or not allowed #92

Sebastokratos42 commented Sep 18, 2020

Buaasinong commented Sep 18, 2020

imfht commented Sep 18, 2020

yangyangdotcom commented Sep 18, 2020

adriprmn commented Sep 19, 2020

adriprmn commented Sep 20, 2020

kujbika commented Sep 20, 2020

Spaskich commented Sep 21, 2020

ASIAMI commented Sep 21, 2020 •

edited

Loading

irwanOyong commented Sep 22, 2020

ASIAMI commented Sep 22, 2020

shiwenThu commented Sep 23, 2020 via email

shiwenThu commented Sep 23, 2020

shiwenThu commented Sep 23, 2020

jonbakerfish commented Sep 24, 2020

ASIAMI commented Sep 24, 2020

irwanOyong commented Sep 26, 2020

ASIAMI commented Sep 26, 2020

ASIAMI commented Sep 26, 2020

AndyAsare commented Sep 27, 2020

404 HTTP status code is not handled or not allowed #92

404 HTTP status code is not handled or not allowed #92

Comments

Sebastokratos42 commented Sep 18, 2020

Buaasinong commented Sep 18, 2020

imfht commented Sep 18, 2020

yangyangdotcom commented Sep 18, 2020

adriprmn commented Sep 19, 2020

adriprmn commented Sep 20, 2020

kujbika commented Sep 20, 2020

Spaskich commented Sep 21, 2020

ASIAMI commented Sep 21, 2020 • edited Loading

irwanOyong commented Sep 22, 2020

ASIAMI commented Sep 22, 2020

shiwenThu commented Sep 23, 2020 via email

shiwenThu commented Sep 23, 2020

shiwenThu commented Sep 23, 2020

jonbakerfish commented Sep 24, 2020

ASIAMI commented Sep 24, 2020

irwanOyong commented Sep 26, 2020

ASIAMI commented Sep 26, 2020

ASIAMI commented Sep 26, 2020

AndyAsare commented Sep 27, 2020

ASIAMI commented Sep 21, 2020 •

edited

Loading