Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for text angle/gradient to be retrieved #4070

Merged
merged 6 commits into from
May 12, 2024

Conversation

Balearica
Copy link
Contributor

@Balearica Balearica commented May 9, 2023

Tesseract already calculates the average gradient (angle) of text lines within Textord::TextordPage at present. The gradient of the text is useful information (as Tesseract performs poorly when the gradient is not [almost] zero), however the average gradient only exists within the Textord::TextordPage function at present, with no way for users to access it. Using the API, getting the gradient currently requires running Recognize or AnalyseLayout and using the results to manually re-calculate the gradient.

This PR allows for users to directly retrieve the existing average gradient value calculated in Textord::TextordPage using a function named GetGradient. This function can be called any time after FindLines has been run. I also made FindLines public so it can be run directly without running Recognize or AnalyseLayout first (running AnalyseLayout would result in paragraph recognition being run twice).

I've already used this branch to implement an auto-rotate feature in the latest version of Tesseract.js that (unlike adding an auto-rotate pre-processing step) does not negatively impact performance for images without problematic rotation. A basic script using GetGradient is below for demonstrative purposes, along with a test image (named rotate_image.png in the code). Resolves #3836.

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
{
    tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
    // Initialize tesseract-ocr with English, without specifying tessdata path
    if (api->Init(NULL, "eng")) {
        fprintf(stderr, "Could not initialize tesseract.\n");
        exit(1);
    }

    // Open input image with leptonica library
    Pix *image = pixRead("rotate_image.png");
    api->SetImage(image);

    // Find lines, get average gradient
    api->AnalyseLayout();
    float gradient = api->GetGradient();

    printf("Average Gradient: %f\n", gradient);

    // Destroy used object and release memory
    api->End();
    delete api;
    pixDestroy(&image);

    return 0;
}

Average Gradient: 0.085301

@amitdo
Copy link
Collaborator

amitdo commented May 9, 2023

I also made FindLines public so it can be run directly without running Recognize or AnalyseLayout first (running AnalyseLayout would result in paragraph recognition being run twice).

So paragraph detection will run twice if AnalyseLayout is followed by Recognize. This looks like a bug.

It can be solved by using this condition:

bool wait_for_text = true;
GetBoolVariable("paragraph_text_based", &wait_for_text);
if (!wait_for_text) {
DetectParagraphs(false);
}

also in AnalyseLayout.

Then we won't need to expose FindLines.

What do you think?

@zdenop
Copy link
Contributor

zdenop commented May 9, 2023

Why not use leptonica solution?
E.g.

#include <leptonica/allheaders.h>

int main() {
  PIX *pix2;
  l_float32 angle, conf;
  Pix *image = pixRead("rotate_image.png");
  pix2 = pixFindSkewAndDeskew(image, 2, &angle, &conf);
  printf("Skew angle: %7.2f degrees; %6.2f conf\n", angle, conf);
  pixWrite("fixed_rotate_image.png", pix2, IFF_PNG);

  pixDestroy(&image);
  pixDestroy(&pix2);
  return 0;
}

@amitdo
Copy link
Collaborator

amitdo commented May 9, 2023

Why not use leptonica solution?

Because it is not necessary. Tesseract does it anyway.

@zdenop
Copy link
Contributor

zdenop commented May 9, 2023

My understanding is that users want to fix rotation before running OCR.
This feature will require to use SetImage twice (first to get the angle and then for the corrected image). I guess my proposal will be faster and without the need to touch tesseract API ;-)

@Balearica
Copy link
Contributor Author

Balearica commented May 9, 2023

My understanding is that users want to fix rotation before running OCR. This feature will require to use SetImage twice (first to get the angle and then for the corrected image). I guess my proposal will be faster and without the need to touch tesseract API ;-)

First, using the text gradient number that Tesseract already calculates does not add any extra steps or runtime for images that are not flagged as having problematic text angles. While I'm sure that adding additional pre-processing steps is a viable solution in many contexts (e.g. processing scanned documents), when building applications where speed is a very high priority, sending all input images through an extra step is sub-optimal. My use case here is maintaining Tesseract.js, which is primarily used in web applications rather than document processing.

Second, Leptonica uses a different methodology from Tesseract for calculating the angle of the page, so using Leptonica's algoirthm adds another point of failure. While both Tesseract and Leptonica sometimes calculate the angle incorrectly, if Tesseract calculates the angle incorrectly the OCR results were almost certainly going to be bad anyway (as this calculation occurs during the line detection step). Using the angle Tesseract calculates is inherently low-risk in that regard. On the other hand, as Leptonica uses a different algorithm, it can calculate text gradient incorrectly in a way that harms images that would otherwise produce high-quality results. When testing both solutions with sample documents, I found the implementation using the angle calculated by Tesseract to produce better results.

Overall, while an individual user may decide that using a separate auto-rotate script is better for their workflow, I think that the angle calculated by Tesseract is useful information and do not believe there's any reason it should not be accessible to the user.

@Balearica
Copy link
Contributor Author

I also made FindLines public so it can be run directly without running Recognize or AnalyseLayout first (running AnalyseLayout would result in paragraph recognition being run twice).

So paragraph detection will run twice if AnalyseLayout is followed by Recognize. This looks like a bug.

It can be solved by using this condition:

bool wait_for_text = true;
GetBoolVariable("paragraph_text_based", &wait_for_text);
if (!wait_for_text) {
DetectParagraphs(false);
}

also in AnalyseLayout.

Then we won't need to expose FindLines.

What do you think?

I think this makes sense conceptually, however if I understand correctly, such a change would impact the results returned by the AnalyseLayout API function when run using default settings (paragraph_text_based is true by default). I'm always hesitant to advocate for any change that could impact existing code. If there is opposition to making FindLines public I think it would be fine to leave FindLines protected and use AnalyseLayout instead even if paragraph detection runs multiple times. I timed these functions using sample image I posted above, and FindLines took ~75ms to run while DetectParagraphs took ~2ms to run, so the overall impact of this inefficiency appears to be fairly minor.

@amitdo
Copy link
Collaborator

amitdo commented May 10, 2023

You are right that my suggestion changes the current behavior, so it's not a good idea.

I still think we should not expose FindLines just to make this use case work.

My new suggestion is to choose one of these option:

  • Take my previous suggestion, but instead of reusing paragraph_text_based, add a new user configurable variable and use it inside AnalyseLayout.
  • Use AnalyseLayout without changes. As you said, speed wise, the impact of the two calls to DetectParagraphs is quite small.

Apart from this small issue, I like the new feature.

@Balearica
Copy link
Contributor Author

@amitdo I changed FindLines back to a protected function. I'm fine with running AnalyseLayout as written.

@amitdo
Copy link
Collaborator

amitdo commented May 10, 2023

@zdenop,

After reading @Balearica's answer to your question, do you object to merging this PR?

@stweil, can we merge it?

src/ccmain/tesseractclass.h Show resolved Hide resolved
@zdenop
Copy link
Contributor

zdenop commented May 10, 2023

First of all: changing/extending C++-API should be reflected in C-API too.

Next: playing with public API has an impact on symver, which has an impact on including new versions in major Linux distributions. This should be carefully planned

Personally (e.g. next is not a showstopper), I prefer that image-related operations are handled by Leptonica. Maybe I miss something so maybe information on how the gradient is planned to use would help me to make it clear ;-) (e,.g. to measure speed/performance).
Also, I would like to get an example image where Tesseract provides better calculation of text gradient than Leptonica, so Dan can have a look at it...

@Balearica
Copy link
Contributor Author

Balearica commented May 10, 2023

First of all: changing/extending C++-API should be reflected in C-API too.

Good point, I edited the C API to reflect this change.

@amitdo
Copy link
Collaborator

amitdo commented May 12, 2023

Regarding semver, the current revision of this PR only adds one method to the public API, so the next version should be 5.4.0.

@zdenop zdenop added this to the 5.4.0 milestone May 12, 2023
GerHobbelt added a commit to GerHobbelt/tesseract that referenced this pull request Aug 11, 2023
GerHobbelt added a commit to GerHobbelt/tesseract that referenced this pull request Aug 11, 2023
@wvanrensburg
Copy link

Anyone know when this will get merged? Almost a year now

@zdenop zdenop requested a review from stweil May 12, 2024 13:16
Copy link
Contributor

@stweil stweil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@stweil stweil merged commit c23792b into tesseract-ocr:main May 12, 2024
1 check passed
zdenop referenced this pull request May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect text rotation without running recognition
6 participants