fix the issue of ernie-layout model inference error cause by invalid… #5866

zirui · 2023-05-08T05:28:39Z

PR types: Bug fixes

PR changes: APIs

Description

fix the issue of ernie-layout model inference error cause by invalid input of image (without ocr results), refer issue:#5865

…input of image (without ocr results), refer issue:PaddlePaddle#5865

paddle-bot · 2023-05-08T05:28:44Z

Thanks for your contribution!

CLAassistant · 2023-05-08T05:28:45Z

All committers have signed the CLA.

linjieccc · 2023-05-08T07:24:52Z

model_zoo/ernie-layout/deploy/python/predictor.py

-            input_data.append(example)
+            if ocr_result:
+                # Only process images with ocr results
+                example = ppocr2example(ocr_result, doc)


@zirui Thanks a lot for your contribution!

I am thinking that if it would be better to handle the case of empty OCR results inside the ppocr2example function.

The deployment script predictor.py is shared among NER, QA, and classification tasks.

It would work fine for NER and QA tasks, but there might be issues if we handle it this way for the document classification task. If the OCR results are empty, they should still be assigned to a category. The current approach would result in an empty string.

The NER/OA/classification tasks share the `predictor.py' script, so all these tasks have this problem.
I agree that further discussion is needed on the return results when the OCR input is empty: What results should be returned for different tasks? Do you have any suggestions?

"handle the case of empty OCR results inside the ppocr2example" should also solve this problem, but it seems need more understanding about what example to return for subsequent processors of different tasks(QA/NER/classification), so i choose the current simpler approach. I will further review the processing details of different tasks to confirm which method is more suitable

@zirui I would suggest handling the case of empty OCR results inside the ppocr2example function.

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/utils/image_utils.py#LL733C5-L733C106

example = {"text": doc_tokens, "bbox": doc_boxes, "width": im_w, "height": im_h, "image": img_base64}

For an example with empty OCR results, the "text" and "bbox" are expected to be empty lists, while the "width," "height," and "image" retain the original image information.

I have adopted your suggestion to modify inside the ppocr2example,
and now for empty ocr input, the results returned by different tasks are similar to this:

cls: [{'doc': './images/test_image_no_ocr.png', 'result': 'specification'}]

mrc: []

ner: [{'doc': './images/test_image_no_ocr.png', 'result': []}]

codecov · 2023-05-19T08:34:40Z

Codecov Report

Merging #5866 (722e1d9) into develop (c9e4fd7) will increase coverage by 0.50%.
The diff coverage is 16.66%.

@@             Coverage Diff             @@
##           develop    #5866      +/-   ##
===========================================
+ Coverage    61.85%   62.35%   +0.50%     
===========================================
  Files          490      491       +1     
  Lines        69003    69280     +277     
===========================================
+ Hits         42679    43201     +522     
+ Misses       26324    26079     -245

Impacted Files	Coverage Δ
paddlenlp/utils/image_utils.py	`49.71% <0.00%> (ø)`
paddlenlp/prompt/prefix.py	`21.01% <13.33%> (-0.36%)`	⬇️
paddlenlp/utils/env.py	`85.36% <100.00%> (+0.36%)`	⬆️

... and 25 files with indirect coverage changes

linjieccc

LGTM. Thanks again!

fix the issue of ernie-layout model inference error cause by invalid …

0af5256

…input of image (without ocr results), refer issue:PaddlePaddle#5865

paddle-bot bot added contributor status: proposed labels May 8, 2023

fix format check of pre-commit

ea72e05

sijunhe requested a review from linjieccc May 8, 2023 06:32

linjieccc reviewed May 8, 2023

View reviewed changes

modify ppocr2example to fix bugs caused by empty ocr input

722e1d9

linjieccc approved these changes May 19, 2023

View reviewed changes

linjieccc merged commit 3642ec5 into PaddlePaddle:develop May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix the issue of ernie-layout model inference error cause by invalid… #5866

fix the issue of ernie-layout model inference error cause by invalid… #5866

zirui commented May 8, 2023

paddle-bot bot commented May 8, 2023

CLAassistant commented May 8, 2023 •

edited

Loading

linjieccc May 8, 2023

zirui May 8, 2023 •

edited

Loading

linjieccc May 9, 2023

zirui May 19, 2023

codecov bot commented May 19, 2023

linjieccc left a comment

fix the issue of ernie-layout model inference error cause by invalid… #5866

fix the issue of ernie-layout model inference error cause by invalid… #5866

Conversation

zirui commented May 8, 2023

PR types: Bug fixes

PR changes: APIs

Description

paddle-bot bot commented May 8, 2023

CLAassistant commented May 8, 2023 • edited Loading

linjieccc May 8, 2023

Choose a reason for hiding this comment

zirui May 8, 2023 • edited Loading

Choose a reason for hiding this comment

linjieccc May 9, 2023

Choose a reason for hiding this comment

zirui May 19, 2023

Choose a reason for hiding this comment

codecov bot commented May 19, 2023

Codecov Report

linjieccc left a comment

Choose a reason for hiding this comment

CLAassistant commented May 8, 2023 •

edited

Loading

zirui May 8, 2023 •

edited

Loading