Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

标注序号和水印识别异常 #947

Closed
simplew2011 opened this issue Nov 13, 2024 · 1 comment
Closed

标注序号和水印识别异常 #947

simplew2011 opened this issue Nov 13, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@simplew2011
Copy link

Description of the bug | 错误描述

image

How to reproduce the bug | 如何复现

Operating system | 操作系统

Linux

Python version | Python 版本

3.9

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cuda

@simplew2011 simplew2011 added the bug Something isn't working label Nov 13, 2024
@myhloli
Copy link
Collaborator

myhloli commented Nov 14, 2024

  1. 可以试着直接在浏览器中打开pdf复制标号文本,再在文本编辑器中粘贴,可以看到标号虽然显示的是①②③形式,但在文档中储存的实际上是abc,因此以文本形式提取标号是abc是符合预期的。
  2. 目前版本已经加了水印去除逻辑,但是仅限跨多个block的大型水印,完全被block包裹的小水印目前的逻辑无法做到识别和去除。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants