Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cropping vs. cutting vs. segmenting #289

Closed
kba opened this issue Aug 21, 2019 · 9 comments
Closed

cropping vs. cutting vs. segmenting #289

kba opened this issue Aug 21, 2019 · 9 comments
Assignees
Labels
discussion Diskussion/ Input aus der Gruppe erforderlich

Comments

@kba
Copy link
Member

kba commented Aug 21, 2019

In the docstrings, cropping currently refers to tasks that could be better described as segmenting (finding regions) or cutting (doing the actual image manipulation).

This came up in #268 but finding the right terminology should not prevent a merge.

We should also extend the glossary.

Here's the pertinent comments on the terms:

@bertsky:

@wrznr is right about insisting that the term cropping (de: Beschneidung) should only apply to the process of finding the Border (and perhaps also removing the margins from the image by cutting), not of other elements down the hierarchy. This we should rather call cutting (de: Freistellen) – it is only due to PIL.Image.crop that I was led astray.
If this is correct, then the docstrings must be fixed accordingly throughout.

That would be more consequential, but I tend to say no: crop_image is meant as replacement for Image.crop and should be memorable. I am becoming less enthusiastic about this terminological distinction by the minute... maybe this should be reverted (sorry).

@wrznr:

Wrt. cropping vs. cutting (vs. segmenting?): Using the term cropping for localizing a page's border was a bad choice right from the start because it mixes the intellectual process of finding the borders and the physical process of separating the OCR-relevant from the irrelevant parts of the actual image. Using cutting does not improve things IMHO. The more I think about it, the more meaningful the use of the term (page-level) segmentation seems to me because this is what cropping right now does: It localizes the segment page on an image file. We could then use cropping as it is intended.

@wrznr wrznr added the discussion Diskussion/ Input aus der Gruppe erforderlich label Aug 21, 2019
@bertsky
Copy link
Collaborator

bertsky commented Aug 21, 2019

This sounds very convincing to me. Except for one problem: (correct me if I am wrong, but) page segmentation usually refers to finding regions, not the border. It would make more sense to call that region segmentation, just as line segmentation creates lines, (so page segmentation would indeed be free for what we used to call cropping), but I never heard that.

@wrznr
Copy link
Contributor

wrznr commented Aug 21, 2019

That's actually what we (@cneud and @kba and me) agreed on: To prefix segmentation with the result and not with the level of operation (i.e. segment image into X). You are absolutely right that page segmentation usually refers to segmentation of the page. But I prefer principle and sound solutions over traditions. 😁

@bertsky
Copy link
Collaborator

bertsky commented Aug 21, 2019

It is definitely a stumbling point for newcomers and users, but I am skeptical whether researchers can be convinced easily to adopt that change terminology. (In the least, page segmentation would have to be disambiguated verbosely for a while.)

Another established term is page frame detection. This already distinguishes itself from the physical operation (of cropping / cutting). So it might be a compromise (and smaller deviation from tradition) to use cropping only as an image operation (not a workflow step) in OCR-D, and consistently use page frame detection for the process of finding Border. As an extra, one could also refrain from using page segmentation and (provocatively but unambiguously) use region segmentation instead.

@wrznr
Copy link
Contributor

wrznr commented Aug 21, 2019

It is a pity that the PAGE element is called Border. Maybe we should go with border_detection on the operation levels page, region and line.

@bertsky
Copy link
Collaborator

bertsky commented Aug 21, 2019

It is a pity that the PAGE element is called Border. Maybe we should go with border_detection on the operation levels page, region and line.

You mean instead of segmentation?

To prefix segmentation with the result and not with the level of operation (i.e. segment image into X).

But that (new) principle could still not be applied for page segmentation (in the new sense): Border detection does not actually segment the source image. So even with region segmentation established, I do not see a place for page segmentation, except in a broader sense covering all levels of segmentation.

@wrznr
Copy link
Contributor

wrznr commented Aug 21, 2019

Yeah! That's why I propose a completely new wording:

ocrd_tesserocr_detect_border -I ORIGINAL -O CROPPED -m mets.xml -p <(echo '{"operation_level": "page"}')
ocrd_tesserocr_detect_border -I CROPPED -O SEGMENT_REGION -m mets.xml -p <(echo '{"operation_level": "region"}')
ocrd_tesserocr_detect_border -I SEGMENT_REGION -O SEGMENT_LINE -m mets.xml -p <(echo '{"operation_level": "line"}')

I.e. foregoing the new principle.

@bertsky
Copy link
Collaborator

bertsky commented Aug 21, 2019

I see. But the last 2 steps (region and line segmentation) do not actually detect any borders (i.e. outer limits) of regions and lines, they rather define those very regions and lines. IMHO we have no good reason to drop the term segmentation itself at this point.

Also, we should probably not concern ourself much with the names of components or processors here – as these need to accomodate other considerations (like using imperative verb forms instead of abstract nouns, e.g. recognize for OCR, correct for OCR post-correction, rate for LM rescoring, or being true to the implementation rather than the general operation they offer) – as much as with the terms we use to describe the workflow steps in our documentation.

That being said, I don't find the existing naming scheme of ocrd_tesserocr all that bad – although I wouldn't mind a slight change like so:

ocrd-tesserocr-crop-page -I OCR-D-IMG -O OCR-D-SEG-PAGE
ocrd-tesserocr-segment-regions -I OCR-D-SEG-PAGE -O OCR-D-SEG-BLOCK
ocrd-tesserocr-segment-lines -I OCR-D-SEG-BLOCK -O OCR-D-SEG-LINE

@kba
Copy link
Member Author

kba commented Feb 3, 2021

@bertsky Is there still something to do from this discussion?

@bertsky
Copy link
Collaborator

bertsky commented Feb 3, 2021

Is there still something to do from this discussion?

Hard to summarise, even harder to reach an agreement at this point.

We have:

  • terminology applied by processor names (often in the form of verbs)
  • terminology applied by ocrd-tool.json and METS steps in the spec and workflow steps in the documentation

We need to accomodate:

  • ambiguity of intellectual process vs. automatic detectors
  • ambiguity of image operation (crop/cut) vs. top-level (page-frame detection)
  • ambiguity of what level to apply to (or start from) vs. what level to reach at

I'm afraid we cannot re-invent the wheel here, or just ignore existing terminology in the academic literature or in the field.

I suggest sticking to page frame detection when necessary to disambiguate over cropping, trying to avoid cropping as a general image operation, keeping the idiomatic page segmentation as a segmentation of pages into regions and line segmentation as a segmentation of regions into lines, but disambiguating further when necessary, and documenting all this in the glossary and specs.

@OCR-D OCR-D locked and limited conversation to collaborators Dec 20, 2021
@lena-hinrichsen lena-hinrichsen converted this issue into discussion #772 Dec 20, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
discussion Diskussion/ Input aus der Gruppe erforderlich
Projects
None yet
Development

No branches or pull requests

3 participants