Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Add test for layout extraction mode #2390

Merged
merged 2 commits into from
Jan 3, 2024
Merged

Conversation

MartinThoma
Copy link
Member

@MartinThoma MartinThoma commented Jan 3, 2024

Prepare a test for #2388

Copy link

codecov bot commented Jan 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4423267) 94.35% compared to head (dac8321) 94.35%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2390   +/-   ##
=======================================
  Coverage   94.35%   94.35%           
=======================================
  Files          43       43           
  Lines        7577     7577           
  Branches     1519     1519           
=======================================
  Hits         7149     7149           
  Misses        265      265           
  Partials      163      163           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma MartinThoma merged commit 89baa2c into main Jan 3, 2024
15 checks passed
@MartinThoma MartinThoma deleted the layout-mode-test branch January 3, 2024 10:43
shartzog added a commit to shartzog/pypdf that referenced this pull request Jan 4, 2024
- DOC: standardize language. use "layout", not "structure/structural".
- BUG: address bug introduced by ruff refactoring (remove "TYPE_CHECKING" block for Literal import)
- DEV: use sys.version_info based import switch (not try/except) for Literal and TypedDict to correct vscode colors and prevent odd mypy errors
- TST: add test created by @MartinThoma in py-pdf#2390
- ENH: add remaining standard fonts and aliases
MartinThoma added a commit that referenced this pull request Jan 19, 2024
## What's new

pypdf==4.0.0 is a big milestone forward:

* We finally have a layout-mode text extraction.
  This enables users who want to detect / extract tables
  with heuristics to give it a try.
* We deprecated a lot of the old PyPDF2 API that was either
  not following PEP8 naming styles or was not using a
  property. Users comming from PyPDF2 might want to switch
  first to pypdf<4.0.0 to get helpful error messages
  that show the new API in their speicific cases.

A big 'Thank you!' the the whole pypdf community for your
work. Thanks to you, pypdf is better than ever.

Kudos to @shartzog who added the layout-mode with his first
contribution!

### Deprecations (DEP)
-  Drop Python 3.6 support (#2369) by @MartinThoma
-  Remove deprecated code (#2367) by @MartinThoma
-  Remove deprecated XMP properties (#2386) by @stefan6419846

### New Features (ENH)
-  Add "layout" mode for text extraction (#2388) by @shartzog
-  Add Jupyter Notebook integration for PdfReader (#2375) by @MartinThoma
-  Improve/rewrite PDF permission retrieval (#2400) by @stefan6419846

### Bug Fixes (BUG)
-  PdfWriter.add_uri was setting the wrong type (#2406) by @pmiller66
-  Add support for GBK2K cmaps (#2385) by @stefan6419846

### Documentation (DOC)
-  Add pmiller66 for #2406 as a contributor by @MartinThoma
-  Add missing expand parameter (#2393) by @Atomnp
-  Resolve build warnings (#2380) by @stefan6419846
-  Fix testing prerequisites (#2381) by @stefan6419846
-  Improve formatting of contributors page (#2383) by @stefan6419846
-  Add Tobeabellwether as a contributor for #2341 by @MartinThoma

### Developer Experience (DEV)
-  Make dependabot aware of our PR prefixes (#2415) by @stefan6419846
-  Fail on Sphinx issues (#2405) by @stefan6419846
-  Move title check to own workflow (#2384) by @MasterOdin
-  Write to temporary files instead of the working directory (#2379) by @stefan6419846
-  Ensure that the PR titles have the correct format (#2378) by @stefan6419846

### Maintenance (MAINT)
-  Complete FileSpecificationDictionaryEntries constants (#2416) by @MartinThoma
-  Return None instead of -1 when page is not attached (#2376) by @MartinThoma
-  Replace warning with logging.error (#2377) by @MartinThoma

### Testing (TST)
-  Add missing pytest.mark.samples annotations (#2412) by @kitterma
-  Correctly close temporary files (#2396) by @stefan6419846
-  Fix  side effect #2379 (#2395) by @pubpub-zz
-  Add test for layout extraction mode (#2390) by @MartinThoma

### Code Style (STY)
-  Use the UserAccessPermissions enum (#2398) by @MartinThoma
-  Run black (#2370) by @MartinThoma

[Full Changelog](3.17.4...4.0.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant