-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User Guide | Q&A | 用户指南 | 问答 #7
Comments
配置文件我应该怎么配置呢?能不能发生产环境一个模板呢? |
Make sure that Scrapyd has been installed and started. |
FYI How to efficiently manage your distributed web scraping projects |
Leave your comment here for help, in English if possible. |
CPU occupancy is too high when i refresh the statistical information,and the web interface has been waiting for a long time. i using scrapydweb to manage 25 scrapyd server nodes on five virtual machines in the same lan,how can I solve it? |
@zhangshengchun Try the latest version v1.0.0 (which fixed issue 14) and remember to set |
@my8100 The problem of high CPU usage has been solved. thanks a lot! |
@my8100 scrapyweb work very well, its a greate project!now i have a question, the finished job list in |
@zhangshengchun You have to manually restart the Scrapyd service. (Note that this operation would reset the running jobs at the same time since it keeps all job data in memory. ) That's why ScrapydWeb displays the finished jobs in the descending order. |
@zhangshengchun Now you can set up the DASHBOARD_FINISHED_JOBS_LIMIT item in v1.1.0 to control the number of finished jobs displayed. |
did you hava a docker deploy version? |
Why not try it yourself and share with us? |
ok.If there is, I don't have to do it. If not, I will do it and share it. |
FYI
|
Hello,I have some questions.How can I start more than one spider in a single scrapyd server without using timer task?In my project,I need to start lots of spiders at the same time(like batch start),or,I need to stop spiders at one scrapyd server.Thanks for your reply ! |
It's not supported yet as I don't think it's a common practice. You can use the Requests library in a Python script instead. |
OK,thanks! |
Is SCRAPYDWEB a true distributed crawler?Use the same queue like Scrapy-redis, or the crawler is running independently? |
ScrapydWeb is a web application for Scrapyd cluster management, not a crawler. Have you read the readme and these tutorials? |
my |
@kachacha |
Fail to decode json from http://scrapyd********.local:80/logs/stats.json: Expecting value: line 2 column 1 (char 1) |
This problem has been solved. It's the port problem. |
Post the full log and the content of stats.json. |
'pip install logparser' on host 'scrapyd1.****.local:80' and run command 'logparser' to show crawled_pages and scraped_items. |
I am running the service with docker. |
Check out the .db files in the data path on your localhost to confirm that it’s mounted successfully. |
你好,想知道是哪里出错的呢?谢谢 |
Go to the user home directory (e.g. /home/yourusername) and try again. |
谢谢。我用的python2.7。删除run.py 124行print(u"xxx") 这行改成print("xxx") 就可以运行了。很感谢 |
@foreverbeyoung |
Hello, I am able to set scrapy and scrapydweb and get them running as a separate docker containers, but the pages and items are N/A, not a number as shown in your example at http://scrapydweb.herokuapp.com The crawlers can be run properly without any errors shown.
When run, I expected some items from customspider in If possible, I would like to know how you set the items setting for scrapyd and scrapydweb on that site so that I would try modifying it to work in my implementation. |
@panithan-b |
Hello, I've tried running scrapyd, scrapydweb, and logparser outside docker and got a log file like this, still not sure why most fields are still "N/A" or null |
What’s the content in file “/spider/logs/tzspider/cosmenet/7dc32862e10311e9bb640242ac130002.log" |
It was the blank file. But now I just know what happened. I set the LOG_LEVEL to ERROR in the scrapy config file for staging environment so it won't print anything except the ERROR level when it runs properly, so it prints nothing. Now I set it to INFO and finally able to see the log content, the number of page crawled, and scraped items. :) |
Hello,how to make the send_text work, I want to use email to alert, but I dont understand what the code in the send_text page do, thank you |
hi, When I run the ninth crawler on each server, it shows waiting to run. How do I set it up to increase the number of crawlers that each server runs?looking forward to your answer. |
Set in the config of scrapyd. ...\scrapyd\default_scrapyd.conf |
thank you |
Why does my timer always run twice? |
Hello, can you record a full scrapydweb tutorial video? I followed the tutorials on the Internet and reported all kinds of mistakes |
This comment was marked as spam.
This comment was marked as spam.
An error occurred while uploading the project str object has no attribute decode |
Some of my existing projects have a lot of ERROR and WARNING log information. Every time I restart scrapydweb, the log reminder threshold is reached, resulting in a lot of emails being pushed to me. Is there any way to solve this problem? |
Are the new emails triggered by any active spider job? |
Yes. When restarting scrapydweb, spiders are still running. |
linux:HTTPConnectionPool(host='192.168.0.24', port=6801): Max retries exceeded with url: /listprojects.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f0a78b2d828>: Failed to establish a new connection: [Errno 111] Connection refused',))
windows:HTTPConnectionPool(host='localhost', port=6801): Max retries exceeded with url: /jobs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000000004589CC0>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。',))
这个我应该怎么解决呢?
The text was updated successfully, but these errors were encountered: