[Feature request] how to batch many .png files? #14

ccchan234 · 2023-02-02T07:22:29Z

Is your feature request related to a problem? Please describe.

I got tons of files, now TE need to be done one file by one file.

Describe the solution you'd like

select several files, Rt click, choose extract to separate files, then extracted to separate files. (may be some people also want extract ALl to 1 single file but please add filename into the 1 single documents thx)

Describe alternatives you've considered

in the form of command

Additional context

ccchan234 · 2023-02-02T07:23:10Z

i have to say TE is very accurate for me, with screenshots taken for pastest MCQ questions.

thanks

danielo515 · 2024-01-04T11:52:46Z

I also find a bit confusing how to use this plugin.
I was expecting some command to scan all the images and generate cache from them,or as this issue states, a whole folder. Is this even possible?

scambier · 2024-01-04T12:06:10Z

Text Extractor was first and foremost built as a sort of "plugin's plugin". The idea was to provide a few basic helper functions for developers to build or expand their own plugin on top of it. Though to my knowledge, it's not used by anything else than Omnisearch.

I was expecting some command to scan all the images and generate cache from them

What is your use case?

danielo515 · 2024-01-04T12:17:15Z

My usecase is to make all the text on my images available for search with omnisearch. I want to execute them all so I can leverage the cache on mobile El jue, 4 ene 2024, 13:06, Simon Cambier ***@***.***> escribió:

…

Text Extractor was first and foremost built as a sort of "plugin's plugin". The idea was to provide a few basic helper functions for developers to build or expand their own plugin on top of it. Though to my knowledge, it's not used by anything else than Omnisearch. I was expecting some command to scan all the images and generate cache from them What is your use case? — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARKJWP2EUCOLEUTHTKX7DDYM2LLZAVCNFSM6AAAAAAUOUZOKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZWHE4DSNBYGQ> . You are receiving this because you commented.Message ID: ***@***.***>

scambier · 2024-01-04T17:24:53Z

Ok so you just need to enable images and pdf indexing in Omnisearch settings on a desktop PC. Omnisearch will ask Text Extractor to get the text for all those files, and that will generate the cache 👍

danielo515 · 2024-01-05T10:54:10Z

Ok, thanks. I think I have that enabled, but I will double check El jue, 4 ene 2024, 18:25, Simon Cambier ***@***.***> escribió:

…

Ok so you just need to enable images and pdf indexing in Omnisearch settings. Omnisearch will ask Text Extractor to get the text for all those files, and *that* will generate the cache 👍 — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AARKJWLTSMJZ4OVZHMLRQ4TYM3QXBAVCNFSM6AAAAAAUOUZOKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGQ4DQNBWGA> . You are receiving this because you commented.Message ID: ***@***.***>

paulpall · 2024-06-08T13:01:53Z

Ok so you just need to enable images and pdf indexing in Omnisearch settings on a desktop PC. Omnisearch will ask Text Extractor to get the text for all those files, and that will generate the cache 👍

I'm not sure if I have missed anything but I can't seem to get this to work with images either. PDF content seems to have been indexed, but with images I have to manually right-click and extract text to clipboard for each image to show up in search.

I had a look at the logs and there were a lot of Text Extractor - OCR Worker timeout image_name eval @ plugin:text-extractor:5068 messages... I'm on an ARM macOS laptop, perhaps there's some conflict stemming from that?

Perhaps a workaround could be a buttton in the settings to ignore timeouts and have it index all the images automatically? Even if it does takes hours, as long as there's a way to keep an eye on the progress, I wouldn't mind.

scambier · 2024-06-10T18:18:01Z

@paulpall

Perhaps a workaround could be a buttton in the settings to ignore timeouts and have it index all the images automatically? Even if it does takes hours

That's what is happening already, when Omnisearch uses Text Extractor, as long as this is enabled.

But if you have many images that cause a timeout (maybe they're particularly large or too complex for the OCR library), the worker is effectively blocked 120 seconds on a single image, and then blocked again on the next image, etc.

Eventually it will go through all of them though, as images are only treated once, even when they timeout.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] how to batch many .png files? #14

[Feature request] how to batch many .png files? #14

ccchan234 commented Feb 2, 2023

ccchan234 commented Feb 2, 2023

danielo515 commented Jan 4, 2024

scambier commented Jan 4, 2024

danielo515 commented Jan 4, 2024 via email

scambier commented Jan 4, 2024 •

edited

Loading

danielo515 commented Jan 5, 2024 via email

paulpall commented Jun 8, 2024

scambier commented Jun 10, 2024

[Feature request] how to batch many .png files? #14

[Feature request] how to batch many .png files? #14

Comments

ccchan234 commented Feb 2, 2023

ccchan234 commented Feb 2, 2023

danielo515 commented Jan 4, 2024

scambier commented Jan 4, 2024

danielo515 commented Jan 4, 2024 via email

scambier commented Jan 4, 2024 • edited Loading

danielo515 commented Jan 5, 2024 via email

paulpall commented Jun 8, 2024

scambier commented Jun 10, 2024

scambier commented Jan 4, 2024 •

edited

Loading