-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grobid does not return anything #1134
Comments
Hi @naarkhoo, If this is the case, then to better investigate we would need to see the Grobid logs.
|
thanks for your response
I do have grobid server running through docker in the background and can
parse other pdf files but not these two specific ones. I can share the log
if it should be needed.
…On Tue, Jun 25, 2024 at 3:31 PM Luca Foppiano ***@***.***> wrote:
Hi @naarkhoo <https://github.com/naarkhoo>,
the default parameters of the langchain parser assumes that you're running
Grobid in local at localhost:8070. See:
https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.parsers.grobid.GrobidParser.html
If this is the case, then to better investigate we would need to see the
Grobid logs.
If it's not the case you should follow the instruction at
https://python.langchain.com/v0.2/docs/integrations/document_loaders/grobid/
The best approach is to install Grobid via docker, see https://grobid.readthedocs.io/en/latest/Grobid-docker/.
(Note: additional instructions can be found [here](https://python.langchain.com/v0.2/docs/integrations/providers/grobid/).)
Once grobid is up-and-running you can interact as described below.
—
Reply to this email directly, view it on GitHub
<#1134 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABWWO2OMAF6F4YKMZMVNO3ZJFWKZAVCNFSM6AAAAABJ3ZYDZOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBYHE3TOMBYG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @naarkhoo, |
18628819.pdf here is the log for 8440333
I also attached the PDF file |
Thanks. I checked them and:
|
Thank you for looking into them. so you mean Grobid doesn't have OCR engine and is only a layout parse ?! interesting, that you say
I can make an issue on their repo and refer to this conversation. |
@naarkhoo One option may be that you hit the timeout, could you please confirm that you are not getting any error message from langchain? Something like: |
not really - I increased the
I tried to run the grobid python without langchain using
but didn't succeed it complains |
I finally found time to check this. Two comments:
|
Thanks so much for looking into this. I start to see if I can fix it myself and/or put it as an issue on langchain ... |
I am using
grobid
throughlangchain
and have observed a weird behaviorI hope you have priviliage to access the following papers
pubmed.ncbi.nlm.nih.gov/8440333
pubmed.ncbi.nlm.nih.gov/18628819
for some reason if I usedoes not return anything but if It works through
pypdfparser
I wonder what could be the underlying reason ?
The text was updated successfully, but these errors were encountered: