Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf文档内容识别错误 #685

Open
1 task done
llm163520 opened this issue Oct 12, 2024 · 9 comments
Open
1 task done

pdf文档内容识别错误 #685

llm163520 opened this issue Oct 12, 2024 · 9 comments

Comments

@llm163520
Copy link

Issues

  • I have browsed through the Issues. 我已浏览过Issues,确定没有重复提问。

Umi-OCR version 程序版本

2.1.4

Windows version 系统版本

win10

OCR plugins Used 使用的OCR插件

PaddleOCR

Reproduction steps 复现步骤

1、用批量文档识别
2、点击开始任务

Problem screenshots or related files (optional) 问题截图或相关文件(可选)

image
20240729185649698-6421.pdf

@llm163520
Copy link
Author

image
其识别结果内容不对,识别结果内容参考上图

@hiroi-sora
Copy link
Owner

hiroi-sora commented Oct 12, 2024

试试: 批量文档标签页→设置→内容提取模式 → 整页强制OCR

@llm163520
Copy link
Author

试试: 批量文档标签页→设置→内容提取模式 → 整页强制OCR

尝试过不行

@hiroi-sora
Copy link
Owner

可以的话,把PDF文件上传上来我看看

@llm163520
Copy link
Author

可以的话,把PDF文件上传上来我看看
文件有上传的呢
https://github.com/user-attachments/files/17348843/20240729185649698-6421.pdf

@hiroi-sora
Copy link
Owner

hiroi-sora commented Oct 14, 2024

哦哦,懂了。你这个文件是旋转了90°的,需要勾选【纠正文本方向】才能正确识别。

另外,排版解析方案建议【不做处理】,以免被方向所干扰。

image

@llm163520
Copy link
Author

如果是API的话这个应该怎么处理呢?

@hiroi-sora
Copy link
Owner

如果是API的话这个应该怎么处理呢?

可以传参开启此功能,详见文档。

https://github.com/hiroi-sora/Umi-OCR/blob/main/docs/http/api_doc.md#/api/doc

image

@llm163520
Copy link
Author

可以的了,感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants