Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calling pdf.Page.GetTextByRow and got result with disordered text with some pdf file #16

Open
cfh0081 opened this issue Jun 17, 2021 · 1 comment

Comments

@cfh0081
Copy link

cfh0081 commented Jun 17, 2021

I discover that calling pdf.Page.GetTextByRow and got result with disordered text with some pdf file. For example, I got "761" which should be "176".
I found the result is that in page.go sort with sort.Sort which is not stable, and replace the sort function with sort.Stable can solve the problem.
And pdf.Page.GetTextByColumn also need to modify the same.

@stuta
Copy link

stuta commented May 13, 2022

I tried replacing sort.Sort with sort.Stable. It did not help this problem. Text is not in the same order as with r.GetPlainText(). GetPlainText seems to produce text in the correct order, but without linefeeds, it makes the text hard to read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants