Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of ocrd_tesserocr common functionality into core #268

Merged
merged 19 commits into from
Aug 21, 2019

Conversation

kba
Copy link
Member

@kba kba commented Aug 1, 2019

Start implementing OCR-D/ocrd_tesserocr#49

@kba kba requested review from wrznr and bertsky August 1, 2019 17:47
@codecov-io
Copy link

codecov-io commented Aug 1, 2019

Codecov Report

Merging #268 into master will decrease coverage by 5.07%.
The diff coverage is 42.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #268      +/-   ##
==========================================
- Coverage   98.07%   92.99%   -5.08%     
==========================================
  Files          30       30              
  Lines        1350     1485     +135     
  Branches      268      287      +19     
==========================================
+ Hits         1324     1381      +57     
- Misses         15       92      +77     
- Partials       11       12       +1
Impacted Files Coverage Δ
ocrd/ocrd/workspace.py 66.86% <16.66%> (-24.29%) ⬇️
ocrd_utils/ocrd_utils/__init__.py 76.07% <59.75%> (-16.52%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e678c0d...5332936. Read the comment docs.

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I see a few possible improvements...

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I see a few possible improvements...

ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/requirements.txt Show resolved Hide resolved
@bertsky
Copy link
Collaborator

bertsky commented Aug 4, 2019

Wait a minute! Where is coordinates_for_segment?

When you add this, please be sure to apply OCR-D/ocrd_tesserocr#68 as well!

kba added a commit to kba/ocrd_tesserocr that referenced this pull request Aug 7, 2019
@kba
Copy link
Member Author

kba commented Aug 7, 2019

Wait a minute! Where is coordinates_for_segment?

It's there now. The rest of common.py has been turned into workspace methods.

More tests would be wise but the ocrd_tesserocr test suite passes.

@kba kba changed the title [WIP] Refactoring of ocrd_tesserocr common functionality into core Refactoring of ocrd_tesserocr common functionality into core Aug 7, 2019
Conflicts:
	CHANGELOG.md
	ocrd/requirements.txt
	ocrd_modelfactory/requirements.txt
	tests/test_utils.py
Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good already. I am glad you adopted the Workspace method option. I will start a PR afterwards with test cases for the 3 new methods as well, if you like. (Based on assets, if that is not asking too much.)

(Please also see unresolved comments from last time.)

CHANGELOG.md Outdated Show resolved Hide resolved
ocrd/ocrd/workspace.py Show resolved Hide resolved
ocrd/ocrd/workspace.py Show resolved Hide resolved
@bertsky
Copy link
Collaborator

bertsky commented Aug 12, 2019

More tests would be wise but the ocrd_tesserocr test suite passes.

Yes, ocrd_tesserocr is very much in need of more test coverage now, and here image_from_page, image_from_segment and save_image_file should be controlled by tests. As for the latter, I believe we would need some real-life workspaces (as in assets) to cover this.

Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Becoming less and less certain of this. @kba what do you think, is this too much?

ocrd/ocrd/workspace.py Outdated Show resolved Hide resolved
ocrd/ocrd/workspace.py Outdated Show resolved Hide resolved
ocrd/ocrd/workspace.py Outdated Show resolved Hide resolved
ocrd/ocrd/workspace.py Outdated Show resolved Hide resolved
ocrd/ocrd/workspace.py Outdated Show resolved Hide resolved
@wrznr
Copy link
Contributor

wrznr commented Aug 21, 2019

Wrt. cropping vs. cutting (vs. segmenting?): Using the term cropping for localizing a page's border was a bad choice right from the start because it mixes the intellectual process of finding the borders and the physical process of separating the OCR-relevant from the irrelevant parts of the actual image. Using cutting does not improve things IMHO. The more I think about it, the more meaningful the use of the term (page-level) segmentation seems to me because this is what cropping right now does: It localizes the segment page on an image file. We could then use cropping as it is intended.

Copy link
Contributor

@wrznr wrznr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not let the terminological discussion slow us down.

@wrznr wrznr requested a review from bertsky August 21, 2019 08:09
@kba
Copy link
Member Author

kba commented Aug 21, 2019

We should not let the terminological discussion slow us down.

I second that, let's discuss in #289.

@kba kba merged commit 27fd169 into OCR-D:master Aug 21, 2019
@kba kba deleted the tesserocr-common branch August 21, 2019 09:45
Copy link
Collaborator

@bertsky bertsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @kba, I forgot to finalize my last review! Please fix in the next PR...

CHANGELOG.md Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Show resolved Hide resolved
@kba
Copy link
Member Author

kba commented Aug 21, 2019

@kba Do you want me to make the repairs myself and commit here?

I'll fix it right away, thanks for spotting.

Fixed in 5fd2875

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants