-
Notifications
You must be signed in to change notification settings - Fork 15.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
If distances is empty in the 'gradient' option of semantic chunker it causes IndexError. #26221
Comments
Facing the same error here |
Me too |
@Aryazaky Could you please share the full code, including how you loaded the pdf so I can debug? Have you tried to use a different breakpoint_threshold_amount instead of the default? |
@tibor-reiss Unfortunately, my Colab has been modified many times since. But I think this was how I did it: from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyMuPDFLoader
loader = DirectoryLoader(docs_path, glob="**/*.pdf", loader_cls=PyMuPDFLoader)
docs = loader.load()
No I haven't. I don't know what's the value range for that. What value do you suggest? |
I have tried to use different breakpoint_threshold_amout but i haven't got it working. |
@Aryazaky range is 0.0..100.0 |
@Aryazaky The second pdf fails because the 3rd page splits into 2 sentences with the default regex. This results Options:
|
Fixes langchain-ai#26221 --------- Co-authored-by: Erick Friis <[email protected]>
Fixes langchain-ai#26221 --------- Co-authored-by: Erick Friis <[email protected]>
Checked other resources
Example Code
Error Message and Stack Trace (if applicable)
Description
I'm trying to use semantic chunker's gradient option to split text. It works great for this pdf, but not this one, and I don't know why. Percentile option works for both pdfs. I think this is either a bug in langchain or in the embedding model that I use. For now, I'll submit a bug report in langchain first.
System Info
Python 3 Google Compute Engine backend
The text was updated successfully, but these errors were encountered: