-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: Invalid input type 'PdfDocument' #235
Comments
same problem ;) |
2 similar comments
same problem ;) |
same problem ;) |
pip install pdftext==0.3.7 pip install marker_pdf==0.2.6(mac-inter), reference #183 |
Looks like some caller tries to pass a Update: see VikParuchuri's answer in VikParuchuri/pdftext#10 (comment): "I think the issues there were with mismatched pdftext/marker versions" |
This is the right answer :) For those of us still using Python 3.11, I'd love to see a 0.2.6.post1 which pins pdftext to 0.3.7 :) |
I encountered the following error when running the following command:
(venv) (base) MacBook-Pro-2:contract-master dylan$ marker_single /Users/dylan/xxxx.pdf /Users/dylan --language Chinese Loading detection model vikp/surya_det2 on device cpu with dtype torch.float32 Loading detection model vikp/surya_layout2 on device cpu with dtype torch.float32 Loading reading order model vikp/surya_order on device mps with dtype torch.float16 Loaded texify model to mps with torch.float16 dtype Traceback (most recent call last): File "/Users/dylan/ai/contract-master/venv/bin/marker_single", line 8, in <module> sys.exit(main()) File "/Users/dylan/ai/contract-master/venv/lib/python3.10/site-packages/convert_single.py", line 26, in main full_text, images, out_meta = convert_single_pdf(fname, model_lst, max_pages=args.max_pages, langs=langs, batch_multiplier=args.batch_multiplier) File "/Users/dylan/ai/contract-master/venv/lib/python3.10/site-packages/marker/convert.py", line 65, in convert_single_pdf pages, toc = get_text_blocks( File "/Users/dylan/ai/contract-master/venv/lib/python3.10/site-packages/marker/pdf/extract_text.py", line 85, in get_text_blocks char_blocks = dictionary_output(doc, page_range=page_range, keep_chars=True) File "/Users/dylan/ai/contract-master/venv/lib/python3.10/site-packages/pdftext/extraction.py", line 75, in dictionary_output pages = _get_pages(pdf_path, model, page_range, workers=workers) File "/Users/dylan/ai/contract-master/venv/lib/python3.10/site-packages/pdftext/extraction.py", line 26, in _get_pages pdf_doc = pdfium.PdfDocument(pdf_path) File "/Users/dylan/ai/contract-master/venv/lib/python3.10/site-packages/pypdfium2/_helpers/document.py", line 78, in __init__ self.raw, to_hold, to_close = _open_pdf(self._input, self._password, self._autoclose) File "/Users/dylan/ai/contract-master/venv/lib/python3.10/site-packages/pypdfium2/_helpers/document.py", line 674, in _open_pdf raise TypeError(f"Invalid input type '{type(input_data).__name__}'") TypeError: Invalid input type 'PdfDocument'
The text was updated successfully, but these errors were encountered: