Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing drop-capital in GT #29

Open
bertsky opened this issue May 23, 2019 · 2 comments
Open

missing drop-capital in GT #29

bertsky opened this issue May 23, 2019 · 2 comments
Assignees
Labels
groundtruth Groundtruth quality issues

Comments

@bertsky
Copy link
Contributor

bertsky commented May 23, 2019

Again, I do not know if this is systematic:

In weigel_gnothi02_1618, on page phys_0001 region TextRegion_1488379719413_342, a drop-capital is missing in the annotation, i.e. it became part of the adjacent paragraph region. Worse, its (larger) line height spilled over into tl_4, the first TextLine of the region, so it has a height of 369 pixels and overlaps tl_5 through tl_9.

@tboenig
Copy link
Contributor

tboenig commented Jul 11, 2019

With fixing in #31 we will fix this too.

It's not a systematic error but also not singular. There are 108 instance of drop-capital in the GT, of which maybe 70% are to be corrected.

Will be modelled according to https://ocr-d.github.io/gt//trans_documentation/lyInitiale.html

@bertsky
Copy link
Contributor Author

bertsky commented Jul 11, 2019

Oh wow, that is a lot!

@tboenig, are you sure you want to use the Relations annotation described in the guidelines? I knever saw this before in GT. Not that I would object from a user perspective!

@kba I think we can support this by adding a validation on drop-capital in the PAGE validator. We need to ensure that a Relations is present then, and uses @type=join. More generally, we could also detect missing drop-capitals via the new geometry validator we discussed (see OCR-D/core#252).

@cneud cneud added the groundtruth Groundtruth quality issues label Nov 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
groundtruth Groundtruth quality issues
Projects
None yet
Development

No branches or pull requests

3 participants