User Guide | Q&A | 用户指南 | 问答 #7

LWsmile · 2018-11-27T03:18:27Z

linux：HTTPConnectionPool(host='192.168.0.24', port=6801): Max retries exceeded with url: /listprojects.json (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f0a78b2d828>: Failed to establish a new connection: [Errno 111] Connection refused',))
windows：HTTPConnectionPool(host='localhost', port=6801): Max retries exceeded with url: /jobs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000000004589CC0>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝，无法连接。',))
这个我应该怎么解决呢？

LWsmile · 2018-11-27T03:22:31Z

配置文件我应该怎么配置呢？能不能发生产环境一个模板呢？

my8100 · 2018-11-27T03:32:10Z

Make sure that Scrapyd has been installed and started.
Then visit the Scrapyd server (in your case: http://192.168.0.24:6801) to check the connectivity.

my8100 · 2018-11-27T03:35:13Z

配置文件我应该怎么配置呢？能不能发生产环境一个模板呢？

FYI

How to efficiently manage your distributed web scraping projects

如何简单高效地部署和监控分布式爬虫项目

my8100 · 2018-11-28T10:17:02Z

Leave your comment here for help, in English if possible.
使用过程中如有疑问，请添加评论，尽量用英语。

zhangshengchun · 2018-12-27T06:58:55Z

CPU occupancy is too high when i refresh the statistical information，and the web interface has been waiting for a long time. i using scrapydweb to manage 25 scrapyd server nodes on five virtual machines in the same lan，how can I solve it?

my8100 · 2018-12-27T16:14:56Z

@zhangshengchun Try the latest version v1.0.0 (which fixed issue 14) and remember to set ENABLE_CACHE = True.
If the problem remains, please new an issue with details.

zhangshengchun · 2018-12-28T04:15:06Z

@my8100 The problem of high CPU usage has been solved. thanks a lot!

zhangshengchun · 2019-01-07T07:28:49Z

@my8100 scrapyweb work very well, its a greate project！now i have a question, the finished job list in
dashboard getting longer and longer，how to remove finished job info and log in dashboard?

my8100 · 2019-01-07T07:57:45Z

@zhangshengchun You have to manually restart the Scrapyd service. (Note that this operation would reset the running jobs at the same time since it keeps all job data in memory. ) That's why ScrapydWeb displays the finished jobs in the descending order.

my8100 · 2019-01-21T02:26:04Z

@zhangshengchun Now you can set up the DASHBOARD_FINISHED_JOBS_LIMIT item in v1.1.0 to control the number of finished jobs displayed.

KimFu2 · 2019-03-19T01:37:52Z

did you hava a docker deploy version?

my8100 · 2019-03-19T01:44:29Z

Why not try it yourself and share with us?

KimFu2 · 2019-03-19T01:47:19Z

Why not try it yourself and share with us?

ok.If there is, I don't have to do it. If not, I will do it and share it.

my8100 · 2019-03-19T01:56:01Z

FYI

$ cat docker-entrypoint.sh 
#!/bin/bash
mkdir /code/logs
/usr/bin/nohup /usr/local/bin/logparser -t 10 -dir /code/logs > logparser.log 2>&1 &
#TO KEEP THE CONTAINER RUNNING
/usr/local/bin/scrapyd  > scrapyd.log

1034467757 · 2019-04-02T08:14:18Z

Hello,I have some questions.How can I start more than one spider in a single scrapyd server without using timer task?In my project,I need to start lots of spiders at the same time(like batch start),or,I need to stop spiders at one scrapyd server.Thanks for your reply !

my8100 · 2019-04-02T08:26:03Z

It's not supported yet as I don't think it's a common practice. You can use the Requests library in a Python script instead.

1034467757 · 2019-04-02T08:31:58Z

It's not supported yet as I don't think it's a common practice. You can use the Requests library in a Python script instead.

OK,thanks!

KimFu2 · 2019-04-02T09:26:46Z

Is SCRAPYDWEB a true distributed crawler?Use the same queue like Scrapy-redis, or the crawler is running independently?

my8100 · 2019-04-02T09:46:56Z

ScrapydWeb is a web application for Scrapyd cluster management, not a crawler. Have you read the readme and these tutorials?

kachacha · 2019-08-05T02:53:06Z

my SCRAPYD_SERVERS have not port ,is error "None of your SCRAPYD_SERVERS could be connected.";Thank you for your reply.

my8100 · 2019-08-05T02:57:37Z

@kachacha
What do you mean? Can you post the full log?

kachacha · 2019-08-05T04:41:38Z

Fail to decode json from http://scrapyd********.local:80/logs/stats.json: Expecting value: line 2 column 1 (char 1)

kachacha · 2019-08-05T04:42:31Z

@kachacha
What do you mean? Can you post the full log?

This problem has been solved. It's the port problem.

my8100 · 2019-08-05T04:51:17Z

Fail to decode json from http://scrapyd********.local:80/logs/stats.json: Expecting value: line 2 column 1 (char 1)

Post the full log and the content of stats.json.
Or rename stats.json to stats.json.bak and restart logparser.

kachacha · 2019-08-05T06:25:35Z

'pip install logparser' on host 'scrapyd1.****.local:80' and run command 'logparser' to show crawled_pages and scraped_items.

kachacha · 2019-08-05T06:27:26Z

Jiyanggg · 2019-08-07T13:11:13Z

I am running the service with docker.
For the persistence of timed tasks, (the scheduled task still exists when restarting)
I have mounted in the "/scrapydweb/data" directory, but it has no effect.
What should I do? Thank you

my8100 · 2019-08-07T13:23:20Z

Check out the .db files in the data path on your localhost to confirm that it’s mounted successfully.
Besides, ScrapydWeb v1.3.0 now supports both MySQL and PostgreSQL backends, simply set up the DATABASE_URL option to store the data.

yutiya · 2019-08-30T02:40:30Z

[2019-08-30 10:35:00,050] INFO     in apscheduler.scheduler: Scheduler started
[2019-08-30 10:35:00,061] INFO     in scrapydweb.run: ScrapydWeb version: 1.4.0
[2019-08-30 10:35:00,062] INFO     in scrapydweb.run: Use 'scrapydweb -h' to get help
[2019-08-30 10:35:00,062] INFO     in scrapydweb.run: Main pid: 5854
[2019-08-30 10:35:00,062] DEBUG    in scrapydweb.run: Loading default settings from /Library/Python/2.7/site-packages/scrapydweb/default_settings.py
Traceback (most recent call last):
  File "/usr/local/bin/scrapydweb", line 11, in <module>
    load_entry_point('scrapydweb==1.4.0', 'console_scripts', 'scrapydweb')()
  File "/Library/Python/2.7/site-packages/scrapydweb/run.py", line 37, in main
    load_custom_settings(app.config)
  File "/Library/Python/2.7/site-packages/scrapydweb/run.py", line 124, in load_custom_settings
    print(u"{star}Overriding custom settings from {path}{star}".format(star=STAR, path=handle_slash(path)))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 21: ordinal not in range(128)

你好，想知道是哪里出错的呢？谢谢

my8100 · 2019-08-30T02:49:24Z

Go to the user home directory (e.g. /home/yourusername) and try again.
It’s recommended to use Python 3 instead.

yutiya · 2019-08-30T03:13:58Z

谢谢。我用的python2.7。删除run.py 124行print(u"xxx") 这行改成print("xxx") 就可以运行了。很感谢

my8100 · 2019-09-19T03:18:31Z

@foreverbeyoung
Try to install scrapydweb in virtualenv.

panithan-b · 2019-09-25T08:06:16Z

Hello, I am able to set scrapy and scrapydweb and get them running as a separate docker containers, but the pages and items are N/A, not a number as shown in your example at http://scrapydweb.herokuapp.com

The crawlers can be run properly without any errors shown.
Here are what I have tried:

In scrapyd.conf, I set items_dir=/spider/items
In scrapydweb_settings_v10.py, I set SHOW_SCRAPYD_ITEMS = True
I was using an example of JsonWriterPipeline in https://docs.scrapy.org/en/latest/topics/item-pipeline.html. Now I'm trying to use the Feed Exports from https://docs.scrapy.org/en/latest/topics/feed-exports.html

When run, I expected some items from customspider in /spider/items, but actually there is no file there

If possible, I would like to know how you set the items setting for scrapyd and scrapydweb on that site so that I would try modifying it to work in my implementation.

my8100 · 2019-09-25T08:28:57Z

@panithan-b
Run scrapyd, scrapydweb, and logparser outside dockers to figure out how they cooperate first,
then update your dockers accordingly.

panithan-b · 2019-09-27T09:12:00Z

Hello, I've tried running scrapyd, scrapydweb, and logparser outside docker and got a log file like this, still not sure why most fields are still "N/A" or null
https://pastebin.com/WqRZYcAB

my8100 · 2019-09-27T09:15:43Z

What’s the content in file “/spider/logs/tzspider/cosmenet/7dc32862e10311e9bb640242ac130002.log"

panithan-b · 2019-09-29T05:26:41Z

It was the blank file. But now I just know what happened. I set the LOG_LEVEL to ERROR in the scrapy config file for staging environment so it won't print anything except the ERROR level when it runs properly, so it prints nothing.

Now I set it to INFO and finally able to see the log content, the number of page crawled, and scraped items. :)

Ray916 · 2020-05-22T02:04:03Z

Hello，how to make the send_text work, I want to use email to alert, but I dont understand what the code in the send_text page do, thank you

stone0018 · 2020-06-05T02:22:21Z

hi, When I run the ninth crawler on each server, it shows waiting to run. How do I set it up to increase the number of crawlers that each server runs?looking forward to your answer.

kachacha · 2020-06-05T04:17:57Z

hi, When I run the ninth crawler on each server, it shows waiting to run. How do I set it up to increase the number of crawlers that each server runs?looking forward to your answer.

Set in the config of scrapyd. ...\scrapyd\default_scrapyd.conf

stone0018 · 2020-06-09T08:15:08Z

嗨，当我在每台服务器上运行第九个搜寻器时，它显示等待运行。我该如何设置它以增加每个服务器运行的搜寻器的数量？期待您的回答。

在scrapyd的配置中设置。... \ scrapyd \ default_scrapyd.conf

thank you

zaifei5 · 2020-06-11T02:00:10Z

Why does my timer always run twice?

icella · 2020-10-22T03:44:10Z

How display real ip in "servers" view page - [scrapyd servers], not '127.0.0.1:6800'?

Now i configured scrapyd_servers in a tuple like ('username', 'password', '127.0.0.1', '6801', 'group'), and the web page dispaly '127.0.0.1:6800', but i wanto real ip , so how to configure SCRAPYD_SERVERS. help pls~~~

PShiYou · 2023-06-17T18:01:18Z

Hello, can you record a full scrapydweb tutorial video? I followed the tutorials on the Internet and reported all kinds of mistakes

488443059 · 2024-02-01T06:18:56Z

An error occurred while uploading the project str object has no attribute decode

Aaron2516 · 2024-11-15T01:32:13Z

Some of my existing projects have a lot of ERROR and WARNING log information. Every time I restart scrapydweb, the log reminder threshold is reached, resulting in a lot of emails being pushed to me. Is there any way to solve this problem?

my8100 · 2024-11-15T01:38:12Z

Some of my existing projects have a lot of ERROR and WARNING log information. Every time I restart scrapydweb, the log reminder threshold is reached, resulting in a lot of emails being pushed to me. Is there any way to solve this problem?

Are the new emails triggered by any active spider job?

Aaron2516 · 2024-11-15T01:47:40Z

Some of my existing projects have a lot of ERROR and WARNING log information. Every time I restart scrapydweb, the log reminder threshold is reached, resulting in a lot of emails being pushed to me. Is there any way to solve this problem?

Are the new emails triggered by any new finished spider job?

Yes. When restarting scrapydweb, spiders are still running.

my8100 added the question Further information is requested label Nov 27, 2018

my8100 self-assigned this Nov 27, 2018

my8100 closed this as completed Nov 27, 2018

my8100 mentioned this issue Nov 28, 2018

原谅我有点蠢,那个怎么添加其他主机的scrapyd服务 #9

Closed

my8100 changed the title ~~会出现拒绝连接的错误~~ User guide for Scrapydweb | Q&A | 用户指南 | 常见问题 Nov 28, 2018

my8100 added the good first issue Good for newcomers label Nov 28, 2018

my8100 reopened this Nov 28, 2018

my8100 mentioned this issue Nov 28, 2018

由于目标计算机积极拒绝 #5

Closed

my8100 pinned this issue Mar 23, 2019

This comment was marked as spam.

Sign in to view

User Guide | Q&A | 用户指南 | 问答 #7

User Guide | Q&A | 用户指南 | 问答 #7

Comments

LWsmile commented Nov 27, 2018

LWsmile commented Nov 27, 2018

my8100 commented Nov 27, 2018

my8100 commented Nov 27, 2018 • edited Loading

my8100 commented Nov 28, 2018

zhangshengchun commented Dec 27, 2018 • edited Loading

my8100 commented Dec 27, 2018

zhangshengchun commented Dec 28, 2018

zhangshengchun commented Jan 7, 2019

my8100 commented Jan 7, 2019

my8100 commented Jan 21, 2019

KimFu2 commented Mar 19, 2019

my8100 commented Mar 19, 2019

KimFu2 commented Mar 19, 2019

my8100 commented Mar 19, 2019

1034467757 commented Apr 2, 2019

my8100 commented Apr 2, 2019

1034467757 commented Apr 2, 2019

KimFu2 commented Apr 2, 2019

my8100 commented Apr 2, 2019

kachacha commented Aug 5, 2019 • edited Loading

my8100 commented Aug 5, 2019 • edited Loading

kachacha commented Aug 5, 2019

kachacha commented Aug 5, 2019

my8100 commented Aug 5, 2019

kachacha commented Aug 5, 2019

kachacha commented Aug 5, 2019

Jiyanggg commented Aug 7, 2019

my8100 commented Aug 7, 2019 • edited Loading

yutiya commented Aug 30, 2019

my8100 commented Aug 30, 2019 • edited Loading

yutiya commented Aug 30, 2019

my8100 commented Sep 19, 2019

panithan-b commented Sep 25, 2019 • edited Loading

my8100 commented Sep 25, 2019

panithan-b commented Sep 27, 2019

my8100 commented Sep 27, 2019

panithan-b commented Sep 29, 2019 • edited Loading

Ray916 commented May 22, 2020

stone0018 commented Jun 5, 2020

kachacha commented Jun 5, 2020

stone0018 commented Jun 9, 2020

zaifei5 commented Jun 11, 2020

icella commented Oct 22, 2020

PShiYou commented Jun 17, 2023

This comment was marked as spam.

488443059 commented Feb 1, 2024

Aaron2516 commented Nov 15, 2024

my8100 commented Nov 15, 2024 • edited Loading

Aaron2516 commented Nov 15, 2024

my8100 commented Nov 27, 2018 •

edited

Loading

zhangshengchun commented Dec 27, 2018 •

edited

Loading

kachacha commented Aug 5, 2019 •

edited

Loading

my8100 commented Aug 5, 2019 •

edited

Loading

my8100 commented Aug 7, 2019 •

edited

Loading

my8100 commented Aug 30, 2019 •

edited

Loading

panithan-b commented Sep 25, 2019 •

edited

Loading

panithan-b commented Sep 29, 2019 •

edited

Loading

my8100 commented Nov 15, 2024 •

edited

Loading