Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-50 NOTCRAWLED https://podcrto.si/ #258

Open
mkragelj opened this issue May 27, 2019 · 1 comment
Open

-50 NOTCRAWLED https://podcrto.si/ #258

mkragelj opened this issue May 27, 2019 · 1 comment

Comments

@mkragelj
Copy link

hi,

I try to harvest this site: https://podcrto.si
As National Library we harvest several domains to preserve the information.
I tried with Heritrix 1.14.4 and 3.4 but without success.

I'm getting this:

[code] [status] [seed] [redirect]
-50 NOTCRAWLED https://podcrto.si/
200 CRAWLED https://e-uprava.gov.si/

and

LONGEST#2:
Queue si,podcrto, (p3)
2 items
wakes in: 13m45s77ms
last enqueued: https://podcrto.si/robots.txt
last peeked: https://podcrto.si/robots.txt
total expended: 15 (total budget: -1)
active balance: 2985
last(avg) cost: 1(1)
totalScheduled fetchSuccesses fetchFailures fetchDisregards fetchResponses robotsDenials successBytes totalBytes fetchNonResponses lastSuccessTime
3 1 0 0 1 0 54 54 16 2019-05-23T07:14:29.825Z
SimplePrecedenceProvider
3

Can anyone help or explain what could be the reason for this?
Thank you in advance.

Best,
Matjaž

@mkragelj
Copy link
Author

..works with WCT 2.01

Matjaž

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant