-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: accepts ETen-B5 and UniCNS-UTF16 encodings #2721
Conversation
There are three aspects I am not sure about:
|
The TBC are just here to wait from feed back from @actuary-chen
I did not focus as this should not be easily subject to regressio on it but I agree it should be better
I dislike the Idea of having a garbage collecting issue on this subject : We need to have some test file to confirm the proper encoding; I prefer new issue to raised on case per case. |
I'veremoved all TBC. Let's wait a litte for some feedbacks from @actuary-chen for the last entries |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2721 +/- ##
=======================================
Coverage 95.14% 95.14%
=======================================
Files 51 51
Lines 8547 8547
Branches 1703 1703
=======================================
Hits 8132 8132
Misses 261 261
Partials 154 154 ☔ View full report in Codecov by Sentry. |
I initially opened the corresponding issue to discuss how this could be done in general or whether there might be any official test documents which would allow us to cover all cases without having lots of small commits for it. |
It sounds good after I retrieve some texts from the database.
Benjamin Chen ***@***.***> 於 2024年6月23日 週日 上午5:25寫道:
… I can only confirm no wording shows as "pypdf._cmap: implementation of
advance cmap ...."
However, I cannot make sure whether the text is correct to decode or not,
because I use it in the embedding model to a vector database.
codecov[bot] ***@***.***> 於 2024年6月22日 週六 下午6:20寫道:
> Codecov
> <https://app.codecov.io/gh/py-pdf/pypdf/pull/2721?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=py-pdf>
> Report
>
> All modified and coverable lines are covered by tests ✅
>
> Project coverage is 95.14%. Comparing base (a512408)
> <https://app.codecov.io/gh/py-pdf/pypdf/commit/a512408c9559771c5b7e67d9c62de64e09ca4c08?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=py-pdf>
> to head (fdbf37c)
> <https://app.codecov.io/gh/py-pdf/pypdf/commit/fdbf37c57d9cd2be0ad48ab9ff0bdd12163c2a7d?dropdown=coverage&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=py-pdf>
> .
> Report is 1 commits behind head on main.
>
> Additional details and impacted files
>
> @@ Coverage Diff @@## main #2721 +/- ##
> =======================================
> Coverage 95.14% 95.14%
> =======================================
> Files 51 51
> Lines 8547 8547
> Branches 1703 1703
> =======================================
> Hits 8132 8132
> Misses 261 261
> Partials 154 154
>
> ☔ View full report in Codecov by Sentry
> <https://app.codecov.io/gh/py-pdf/pypdf/pull/2721?dropdown=coverage&src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=py-pdf>
> .
> 📢 Have feedback on the report? Share it here
> <https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=py-pdf>
> .
>
> —
> Reply to this email directly, view it on GitHub
> <#2721 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AEO7QJAE5IPC76MWIUEYZZTZIVFXTAVCNFSM6AAAAABJWTN6WSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBTHE3TCMRXGQ>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
## What's new ### New Features (ENH) - Accept ETen-B5 and UniCNS-UTF16 encodings (#2721) by @pubpub-zz - Add decode_as_image() to ContentStreams (#2615) by @pubpub-zz - context manager for PdfReader (#2666) by @tibor-reiss - Add capability to set font and size in fields (#2636) by @pubpub-zz - Allow to pass input file without named argument (#2576) by @pubpub-zz ### Bug Fixes (BUG) - Fix deprecation for Ressources when using old constants (#2705) by @stefan6419846 - Fix images issue 4 bits encoding and LUT starting with UTF16_BOM (#2675) by @pubpub-zz - Reading large compressed images takes huge time to process (#2644) by @snanda85 - Highlighted Text Cannot Be Printed (#2604) by @Nifury - Fix UnboundLocalError on malformed pdf (#2619) by @farjasju ### Documentation (DOC) - Various improvements on docstrings and examples by @j-t-1 ### Robustness (ROB) - Cope with missing Standard 14 fonts in fields (#2677) by @pubpub-zz - Improve inline image extraction (#2622) by @pubpub-zz - Cope with loops in Fields tree (#2656) by @pubpub-zz - Discard /I in choice fields for compatibility with Acrobat (#2614) by @pubpub-zz - Cope with some issues in pillow (#2595) by @pubpub-zz - Cope with some image extraction issues (#2591) by @pubpub-zz ### Maintenance (MAINT) - Deprecate interiour_color with replacement interior_color (#2706) by @j-t-1 - Add deprecate_with_replacement to PdfWriter.find_bookmark (#2674) by @j-t-1 ### Code Style (STY) - Change Link to be a non-markup annotation (#2714) by @j-t-1 [Full Changelog](4.2.0...4.3.0)
Related to #2356