Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep track of links that are unvisited due to failed response #8

Open
innovationchef opened this issue May 25, 2018 · 2 comments
Open

Comments

@innovationchef
Copy link
Member

No description provided.

@innovationchef
Copy link
Member Author

I am still not able to completely understand how the sitemap spider in working. The spider keeps crawling down the sitemap.xml until it receives a valid page response. In between the first request to final page - scrapy redirects from HTTP to HTTPS protocol for once in between, however, I am not able to figure out where it does so. Ideally, there should be a point where the response.status says 301 redirections, but the process_response in the middleware that I wrote skips (basically it is happening somewhere inside such that I can't log it from a middleware) this part in the middle and only outputs the final responses - 200. Thus, I am not able to log other 40x responses using the process_response() function. What if these responses are also being handled in the backend? (Which seems to be the only case) How to track these response statuses and log the URLs returning these responses? - There seems to be an answer, but I am not sure how to rigorously test it.

So, how do I test these? I mean, I cannot generate a 402 response on my own (Or maybe IDK how to do it) to test the custom response handlers for these responses.

@justinccdev
Copy link
Member

I'm okay with simply dropping failed response and not revisiting. Perhaps if a whole website failed this would be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants