Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

最新版,对于含注释的文档识别不出注释部分的内容 #956

Open
wahahaer opened this issue Nov 14, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@wahahaer
Copy link

Description of the bug | 错误描述

现在用的:
magic-pdf, version 0.9.2

例图如下,用官方默认设置,输出的markdown不含注释部分内容
但是之前用magic-pdf, version 0.7.0b1,都能解析出注释部分并识别

不知道代码中动了什么
或者有参数可以调节是否忽略注释部分内容吗?
ClipBoard

How to reproduce the bug | 如何复现

magic-pdf -p . -o .

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.9.x

Device mode | 设备模式

cuda

@wahahaer wahahaer added the bug Something isn't working label Nov 14, 2024
@wahahaer
Copy link
Author

ClipBoard

@wahahaer
Copy link
Author

图不知道为啥一直不显示,在文本“或者有参数可以调节是否忽略注释部分内容吗?”下方,鼠标变成小手点击即可

@myhloli
Copy link
Collaborator

myhloli commented Nov 14, 2024

输出不含footnote是预期之中的,如果需要解析这部分内容,请自行通过解析xxx_middle.json中的discarded_blocks字段获取

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants