Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyPDF2 Ownership/Collaborators Available #657

Closed
mstamy2 opened this issue Mar 25, 2022 · 18 comments
Closed

PyPDF2 Ownership/Collaborators Available #657

mstamy2 opened this issue Mar 25, 2022 · 18 comments
Labels

Comments

@mstamy2
Copy link
Collaborator

mstamy2 commented Mar 25, 2022

Hi all

Starting a discussion to see if anyone would be interested in taking full ownership or added as a collaborator of PyPDF2.

The folks over at PyPDF4 originally had wanted to take that repo to be the new definitive and fully-maintained version of the library, with some cleanup/refactoring and other backwards-incompatible changes, eventually deprecating PyPDF2.

Unfortunately that hasn't happened, and obviously PyPDF2 usage has only continued to grow. I have reached out in the past and seems there wasn't much interest in anyone taking over, but hoping that has changed now.

I encourage anyone to reach out who has a history of contributing to PyPDF2 and who has the time to give this excellent library the attention it deserves! I know the backlog of issues/PRs is stacked, and I do apologize to all users as this should have been done much sooner.

Please let me know if you have any interest!

@hyzyla
Copy link

hyzyla commented Mar 28, 2022

It would be great to transfer ownership to an open source group, such as jazzband or encode. This will reduce the risk of the library being abandoned by a future maintainer, and it will be easier to obtain funding if some company wants to provide financial support.

@pubpub-zz
Copy link
Collaborator

I've worked on a fork to fix/add features to pypdf4. If any one is interesting into having a look
https://github.com/pubpub-zz/PyPDF4

@nicksofn
Copy link

Anyone have opinions on PyPDF3

https://github.com/sfneal/PyPDF3

@pubpub-zz
Copy link
Collaborator

Anyone have opinions on PyPDF3

https://github.com/sfneal/PyPDF3

I tried it very quickly about 2 years ago. I had some bugs (similar to pypdf2), some were fixed in pypdf4

@MasterOdin
Copy link
Member

MasterOdin commented Mar 31, 2022

Hi mstamy2,

I'd like to volunteer to be added as a collaborator or fully take over maintainence of this library. I've used this library for quite a while, and I think it would be awesome to see it continue and flourish.

I've not made any commits towards pypdf2 or its clones, but that has been more a function of not being sure any such contribution would have been merged than anything else. I do believe that I can meet the active time requirement for maintainence, and my GH profile should demonstrate a continued interest in open source stuff for the years.

@MartinThoma
Copy link
Member

MartinThoma commented Apr 3, 2022

I'm interested; especially in #658

About me

I'm working full-time as a senior backend dev with Python daily and I am currently writing an introductory book about Python in my spare time. That includes a part where I mention how to deal with PDF files. This is how I got into this discussion.

In the past, I had several projects that included parsing thousands of PDF files (one where I needed to generate one; but that was a rather short project where I used jinja2+pdflatex). However, I only have experience in using different libraries ... and likely also not the deepest level of experience here.

I also gave twice a workshop for PhD students at TUM on how to create Python packages. A student there complained that one of the packages she uses is no longer maintained / not available for Python 3 ... now I maintain propy3 😅 🙈

At my current job, PDFs are less important. I still have contact to the clients from my consulting days who dealt a lot with PDF files.

PyPDF2 Project Governance

In order to make sure that we don't get this situation again where we have a good package without support/maintenance, an organization / having at least two people is necessary. I've just created a Github organization: https://github.com/py-pdf

However, we would need to have some community guides / governing docs: Which kinds of PRs do we accept? Who will be put in the author list? Who will become a maintainer and how to we expect maintainers to behave?

Maybe we could write something like https://docs.scipy.org/doc/scipy-1.8.0/html-scipyorg/dev/governance/governance.html

License and Authorship

I hope I'm not offending anybody by writing that much / going possibly already two steps too far; I'm excited 😅

Before I spend a significant amount of my free time with PDFs / PyPDF again, I would like to make sure that I'll not get into legal issues. On PyPI, I see this:

image

What does the "unknown" mean? Looking at the license file, it seems to be the standard 3-clause BSD.

Also, the original developer of PyPDF is Mathieu Fenniak who is not mentioned in the README (he is mentioned in the license file). Is that Ok for him?

@MasterOdin
Copy link
Member

What does the "unknown" mean? Looking at the license file, it seems to be the standard 3-clause BSD.

PyPI does not look at the contents of the LICENSE file to determine the package license, it uses the License field in setup.py and failing that, the Classifier field. For PyPDF2, it uses the classifier License :: OSI Approved :: BSD License (there's no precise classifier for 3-clause, see pypa/trove-classifiers#17) which PyPI then appends the "UNKNOWN" bit to so you know you'll need to check what variant it is.

Also, the original developer of PyPDF is Mathieu Fenniak who is not mentioned in the README (he is mentioned in the license file). Is that Ok for him?

He's also listed as the author in setup.py and in the LICENSE file. PyPI just shows Phaseit as the "author" (described as the maintainer in the PyPI) probably so no one confusingly emails Mathieu for support questions vs Phaseit. Those interested in the question of code ownership can of course easily look that information up in the LICENSE or setup.py files.

@dantownsend
Copy link

I'm glad there's interest in maintaining this great library. 👍

@mstamy2
Copy link
Collaborator Author

mstamy2 commented Apr 5, 2022

@MartinThoma @MasterOdin I've added you both as collaborators - also have no problem giving one of you ownership if you're brave enough to commit to that! Else I think transferring to one of the open source groups is always a good option, either now or in the future.

Going off the discussion in #658, I do see the value with a new package name. I also really like the approach of phased major version bumps. Pretty confident we could also get PyPDF3 or PyPDF4 relatively easily as well, or have them all deprecated and point to the new library. Also with a new package, that sizable backlog of issues would disappear (:

Is there any consensus on the approach to take? I see @MartinThoma has already gotten started on https://pypi.org/project/pdffile, I guess with the intention of taking the pdf name?

@MartinThoma
Copy link
Member

Thank you very much!

The "pdf" library I'm currently creating is more a collection of some ideas. I'm still not sure where this will go.

I think for the start it would be good to introduce CI + get good test coverage for PyPDF2. Then take care of the many PRs. Then I'll know the library better to tell which of my current ideas really make sense + introduce breaking changes with proper community alignment.

@MartinThoma
Copy link
Member

Oh, and I'm looking forward to make the first new release of PyPDF2 on pypi for a long time! Would it be possible to give me access there as well?

@rtpg
Copy link

rtpg commented Apr 6, 2022

I would second jazzband as a good option for stewardship in the future, it allows for more collaboration and "next of kin"-style things to be handled smoothly. But of course now it looks like there's a way forward here

@MartinThoma
Copy link
Member

From what I see, I like jazzband. However, I feel we first need to increase the test coverage + documentation before that is an option: https://jazzband.co/about/guidelines

And I also like having a Github organization which is dedicated to Python PDF stuff ... but I'm uncertain if that ever would take off 😄

@MasterOdin MasterOdin pinned this issue Apr 7, 2022
@MartinThoma
Copy link
Member

PyPDF 1.27.0 was just released 🎉

@MartinThoma
Copy link
Member

@MasterOdin You seem to have more permissions than I do. Could you please remove the "wiki" and "project" setting via GitHub settings? We don't use them and I think they are only confusing.

@MasterOdin
Copy link
Member

Only @mstamy2 has that power (as well as adjusting any other settings). Given a spirit of community, might be good to migrate this repo to https://github.com/py-pdf org that @MartinThoma created, and then that could have multiple owners who could pull levers like this.

@MartinThoma
Copy link
Member

I would love to move it to a community and I like the idea of having a Python-PDF community. I could also imagine several repositories in there:

  • PyPDF2
  • Sample files for testing (could be useful for several projects)
  • Community Scripts (moving that away from PyPDF2)
  • PDF CLI Toolkit (e.g. pdfcat, potentially others)

Jazzband was mentioned before. On a first glance that looks also like a good option.

My proposal would be this:

  1. A transition period so that @mstamy2 feels comfortable / can trust me with taking ownership. I was thinking of maybe 6 months?
  2. Moving the PyPDF2 repository to the github py-pdf organization.
  3. If/when I realize that I cannot continue maintainig PyPDF2 (e.g. seeing that issues / PRs pile up) I give PyPDF2 to Jazzband or to a person who is willing to put in that much effort

@mstamy2
Copy link
Collaborator Author

mstamy2 commented Apr 7, 2022

@MartinThoma as mentioned on #666 you've certainly invested enough to take ownership already - may want to check if any URLs break from it

Really like the idea of an org and dedicated place for community scripts and test PDFs. There are so many non-conforming and just plain erratic PDFs in the wild, and it's a good idea to try and support them (within reason)

@py-pdf py-pdf locked and limited conversation to collaborators Apr 9, 2022
@MartinThoma MartinThoma converted this issue into discussion #685 Apr 9, 2022
@MartinThoma MartinThoma unpinned this issue Apr 16, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Projects
None yet
Development

No branches or pull requests

8 participants