Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add path drawing API #196

Merged
merged 73 commits into from
Jan 10, 2022
Merged

Add path drawing API #196

merged 73 commits into from
Jan 10, 2022

Conversation

torque
Copy link

@torque torque commented Jul 29, 2021

This adds an API for drawing arbitrary open/closed paths, including support for arbitrary affine transformations and cubic Bézier curves as supported by the PDF format, which provides a superset of the functionality of FPDF's current shape drawing functions.

This is exposed through use of context managers, which allows the addition of arbitrary path elements (lines and curve) and manages finishing and rendering the output. This API provides a somewhat more rigid structure than is dictated by the PDF format in order to make it very difficult to produce degenerate paths/drawings.

The path generation is implemented and rather fully featured relative to the PDF 1.3 specification. However, it currently has very mediocre test coverage, some pylint complaints (some of which will need to be silenced), a dearth of docstrings, and I haven't written any formal documentation for it yet, either. I'm opening this PR now mostly because this is a fairly large chunk of code, and I want to get some feedback on the API and make sure this a worthwhile addition before I put more effort into the details.

In addition to the current unit tests, I have a slightly more complex example shown in this gist.

TODO:

  • have the code pass the pylint checks
  • add a Markdown documentation page for drawing API in docs/
  • add a Markdown documentation page for SVG cpmverter in docs/
  • add a short mention of this to CHANGELOG.md
  • add docstrings to public functions/classes/methods
  • reasonable unit test coverage for drawing API
  • reasonable unit test coverage for SVG converter

@alexp1917
Copy link
Collaborator

@torque very cool and thanks for asking for feedback early on, and sorry for the delay in response, i only really have time to work on this on the weekends. I would need a bit more time to develop an opinion on whether or not this API is good. I think one of the original reasons for this is to be able to parse svgs and apply them to a document. does python have a way to meta-program @contextmanagers in this way so that this work can be re-used for that purpose? I don't know. a lot of python projects (this one included, not just the PR) are self serving, so maybe we should be satisfied with this current way of doing things which is very verbose but really cool, even if its not very flexible.

overall incredible stuff -- just not ready to immediately jump in and give a 1000 line PR the attention it needs, maybe within the next week ill do this

can you explain what the alternative might look like if the api wasnt so restrictive and automatic? i think the primitives most people want are like more basic than this -- which i think can be built on top of what you wrote -- along with a way to just feed it an svg string. anyways, looks like the right direction - but needs more clarity on specifics. maybe, and this is where anyone who works with python more than i do can feel free to correct me, this can be remedied by separating some of these classes into their own modules. some seem redundant - if the stack is being maintained by contextmanagers, then is the pop and push class necessary? The theme here i am trying to figure out what the actual first commit looked like. unless you just sat down with the idea and typed out the whole thing in an hour, in which case, ill do my best to keep up.

Copy link
Collaborator

@alexp1917 alexp1917 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting stuff, just seeing it now (will need some time to fully review)

@torque
Copy link
Author

torque commented Aug 1, 2021

My initial goal for writing this was actually to allow fonts to be converted to paths (and thus not needing to embed the font in the PDF, as I have worked with print shops that can't use embedded fonts). At some point I decided it made sense to provide an entire API for doing path drawing, as once that's done, it becomes trivial to write a pen implementation for the fonttools library that draws the glyphs, but doing the actual text layout is a complex enough problem that I've decided not to tackle it for now. Rendering simple SVGs to PDF would be straightforward with this API, though SVG is a complicated enough format that I expect there would be a lot of edge cases that would be somewhere between nontrivial and impossible to handle. I could make a toy example that shows off the trivial case for this one, but that's something that would make sense to build on top of this, rather than into it, in my opinion.

The main motivation for the use of contextmanagers here is that it saves the user from having to remember to make separate calls to start and end the path, but most of the API is usable without actually needing to use a contextmanager (the only one that doesn't really have an alternative is actually rendering the path to the PDF, but that could remedied very easily). I'll come up with some example that demonstrates use of the non-contextmanager API.

The API I have crated isn't actually restrictive from a functionality perspective, as it should allow any shape to be drawn. However, the basic PDF path drawing specification is itself fairly unstructured, so it's a lot of work for no real benefit for a structured API to produce what I would consider the "minimal representation" of a specific path. An example of this is transforms: a transform only affects the points that are defined after it, so it can be "safe" to use a transform without pushing a new graphics context if all remaining path points should be transformed (e.g. q 1 1 m 2 2 l 2 0 0 2 0 0 cm 1 2 l S Q is equivalent to q 1 1 m 2 2 l q 2 0 0 2 0 0 cm 1 2 l Q S Q in terms of output, but the latter is always produced by the API because it's only guaranteed to be "safe" to use transforms after pushing a new graphics context, as popping the graphics context is the only mechanism PDF provides to revert the application of a transform). From an opposite perspective, painting attributes like the stroke width can be set multiple times and the rendered PDF will only use the last one defined in the graphics context in which the path is painted. Having the API make these properties of the painted path makes it harder to define no-op painting attributes in the middle of the path while still preserving the last-set rule. I suspect this explanation would be greatly aided by some real examples, so I'll work to add some.

As for the classes for pushing/popping the graphics state, this is currently built to perform lazy rendering (where rendering in this case just means encoding the path representation into the PDF text representation), so all path items are represented as classes that have a render method that outputs the appropriate string representation. This approach can simplify debugging, as it allows outputting a tree of the in-memory object structure, and having each element in the tree belong to a unique class simplifies producing this output (this structure is effectively recursively flattened during the rendering process, so in the general case it can't be reconstructed after the fact). However, Python's object mutability can cause this to have surprising results if objects in the render tree are modified between being added and being converted, which is something I plan to address by copying paths instead of just storing references to them when adding them to the render tree.

@torque
Copy link
Author

torque commented Aug 2, 2021

Since it made an interesting challenge and was a good way to stress test the API, I went ahead and wrote a very sloppy and rudimentary SVG-to-PDF converter that supports simple SVGs. As I mentioned, due to SVG being a complex format and having some features that don't map 1-to-1 to PDF, it would be quite an effort to support more than this (as is, the current version has some issues that I deem acceptable for something I hacked up as a proof-of-concept in 3 hours).

I put the demo in a gist. There's a readme with some info, the script itself, the input, and the output. You may be most interested in the main function which creates the PDF and draws the path to it. This demonstrates use of only one context manager.

Some notes from the readme I put on there: this is a somewhat nontrivial example, as it works on a SVG that includes internal cross references as well as transforms. However, some native SVG path operations are not supported, and there are a few issues with the PDF output:

  • no transparency. This wasn't added to the PDF rendering model until 1.4, so it's not supported.
  • thin lines between the lobes of the bars. Stroke width 0 in PDF means "thinnest possible line" rather than "no line", and there's not a good way to make this match the SVG rendering model without propagating the (lack of) existence of the stroke down through all of the graphics contexts to the path itself, which means there's not a nice way to handle this. I'd rather this API match the PDF rendering model than try to contort itself to match the SVG rendering model.
  • reused painted paths (the bars) have redundant painting operations appended on rendering. This is because the path currently mutates itself on render to insert the painting operation. This is planned to be fixed, but it doesn't affect the visual output.

@Lucas-C
Copy link
Member

Lucas-C commented Aug 3, 2021

Wow! This is definitively a worthwhile addition! Awesome work @torque 😲

A few questions overall:

  • why aiming for PDF 1.3 compatibility only? You mentioned transparency being available starting from PDF 1.4, and that seems very useful.
  • your SVG-to-PDF converter is great! Would you agree to include it in your PR?
    I think it could live in fpdf2, as we are already hosting a rudimentary HTML-to-PDF converter.
    Would you be OK to drop the dependency on BeautifulSoup and use the standard xml module instead to parse SVGs?

Now jumping in the discussion and replying to the previous comments:

@torque The main motivation for the use of contextmanagers here is that it saves the user from having to remember
to make separate calls to start and end the path

This looks like idiomatic Python to me, and is a handy way to ensure users do not shoot themselves in the foot 😊

@torque This approach can simplify debugging, as it allows outputting a tree of the in-memory object structure

I consider this a great sowftare engineering choice, good job!

@torque: However, Python's object mutability can cause this to have surprising results if objects in the render tree are modified between being added and being converted

Maybe a solution to that problem could be to use immutable objects?
Currently fpdf2 relies on typing.NamedTuple to implement them in several modules.
What do yout think @torque?

As a reminder, there is a list of things that should probably be included in this PR before merging it:

  • have the code pass the pylint checks
  • add a Markdown documentation page in docs/
  • add a short mention of this great work in CHANGELOG.md
  • ensure the test coverage of the new module is OK

I can help with any of those items, just wave if you'd like a hand @torque 😉

While I undestand that this is stil a work-in-progress, let me already add you to the list of contributors:

@Lucas-C
Copy link
Member

Lucas-C commented Aug 3, 2021

@all-contributors please add @torque for code

@allcontributors
Copy link

@Lucas-C

I've put up a pull request to add @torque! 🎉

@py-pdf py-pdf deleted a comment from allcontributors bot Aug 3, 2021
Copy link
Member

@Lucas-C Lucas-C left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great job in structuring this code!
It's really well designed and clear overall
🎉

fpdf/fpdf.py Show resolved Hide resolved
fpdf/fpdf.py Outdated Show resolved Hide resolved
fpdf/fpdf.py Outdated Show resolved Hide resolved
fpdf/fpdf.py Outdated Show resolved Hide resolved
fpdf/drawing.py Outdated Show resolved Hide resolved
fpdf/drawing.py Outdated Show resolved Hide resolved
fpdf/drawing.py Outdated Show resolved Hide resolved
fpdf/drawing.py Outdated Show resolved Hide resolved
fpdf/drawing.py Outdated Show resolved Hide resolved
fpdf/drawing.py Outdated Show resolved Hide resolved
@torque
Copy link
Author

torque commented Aug 4, 2021

Thanks for the review, I'll start going through it soon. To answer some of your immediate questions,

why aiming for PDF 1.3 compatibility only? You mentioned transparency being available starting from PDF 1.4, and that seems very useful.

Since 1.3 is the default version of PDF that fpdf outputs, this seemed to be a safe target, so it was what I started with. Additionally, it appears that transparency in the graphics state cannot be managed via directly emitting drawing directives and is instead specified through the graphics state parameter dictionary, which I found to be much less clearly documented in the PDF specification. The graphics state parameter dictionary does seem to be the more proper way to manage the graphics styling since PDF 1.3 (somewhat unfortunately, in my opinion), so I should probably put in the time to actually understand it and implement it before we land this.

your SVG-to-PDF converter is great! Would you agree to include it in your PR?
I think it could live in fpdf2, as we are already hosting a rudimentary HTML-to-PDF converter.
Would you be OK to drop the dependency on BeautifulSoup and use the standard xml module instead to parse SVGs?

I would be happy to tidy it up a bit and include it, with the caveat that it is mostly a proof of concept that others can work to improve in the future. There are a variety of core SVG features that it doesn't handle at all, such as styling via embedded CSS, that I do not plan to add to it due to the complexity of those features. My main concern would be that because it can only handle a limited subset of SVGs, users may be disappointed when it fails to properly convert arbitrary SVGs they have. If you're okay with that limitation, I will include it with this pull request. I will need to look at Python's stdlib xml module in more detail, but it seems that the xml.etree API is similar enough that it will not be terrible to switch over.

Currently fpdf2 relies on typing.NamedTuple to implement them in several modules. What do you think?

This is not a bad suggestion. I was shying away from using typing because Python 3.6 is supported, and it lacks the ability to enable postponed annotation evaluation, which I have found can make it difficult to use annotations due to declaration ordering issues. I'll take another look at this, though, as it may offer a nice simplification of some of the trivial classes in here. Unfortunately, it doesn't completely solve the problem because GraphicsContext objects just contain a list of all of the items within them (which may include other GraphicsContexts) in a normal list, which can be modified even as part of a NamedTuple (and has to be modifiable, as this is the general mechanism for sequentially appending path drawing operations). I think implementing and using copy.deepcopy will also help to mitigate this, but I don't know if there is a good way to fully ensure the objects can't be mutated. I think this is also something that having documentation explain the caveats will help with.

I also appreciate the checklist, that'll a good starting point to keep track of the details. I'll add it to the main PR text (I think this causes github to add a progress bar or some other UI element).

@Lucas-C
Copy link
Member

Lucas-C commented Aug 4, 2021

Sounds great!

it is mostly a proof of concept that others can work to improve in the future. There are a variety of core SVG features that it doesn't handle at all, such as styling via embedded CSS, that I do not plan to add to it due to the complexity of those features.

I'm totally OK with that. Maybe just include a comment along those lines in your PR and that will be perfect.
Other contributors will be able to build up on this excellent starting point if they want more features. And our current HTML-to-PDF converter does not support CSS styling either.

it can only handle a limited subset of SVGs, users may be disappointed when it fails to properly convert arbitrary SVGs they have. If you're okay with that limitation, I will include it with this pull request.

I'm entirely OK with that 😉
Again, just a few words in the docstring should be enough to be clear about the capacities of this converter.

@torque
Copy link
Author

torque commented Aug 21, 2021

After creating my own scope creep and not having a lot of time to work on this, I believe I have landed on a level of functionality I am happy with. Notably, my SVG conversion demo now renders (as far as I can tell) more or less exactly correctly without any modifications to the source file. There are still a lot of unsupported SVG features, but all of the core path drawing directives (including arcs) are now implemented. I will add support for the standalone drawing elements (rect, circle, ellipse, line, polyline, and polygon), which should be straightforward (I do have a couple of nontrivial changes I want to make to the SVG conversion code structure still).

With the functionality mostly settled, I need to write most of the documentation, docstrings, and tests, but I'm hoping to make a reasonable dent the documentation tomorrow.

From a functionality perspective, I noticed at some point that FPDF mostly locates elements from the top-left corner of the page, but this API currently draws items relative to the standard PDF origin, which is the bottom-left corner. I suppose this should follow the FPDF origin convention, do you agree? theoretically this is as simple as scaling all drawings by -1 about the vertical midpoint of the page, which should be trivial to do.

Anyway, I'm hoping to get this wrapped up before the heat death of the universe arrives, though don't have much time outside of weekends to work on it, so it may be a couple of weeks out still.

@Lucas-C
Copy link
Member

Lucas-C commented Aug 21, 2021

FPDF mostly locates elements from the top-left corner of the page, but this API currently draws items relative to the standard PDF origin, which is the bottom-left corner. I suppose this should follow the FPDF origin convention, do you agree?

Yes, I think it is best that your code use a top-left corner origin, otherwise fpdf2 users will be disoriented.

Anyway, I'm hoping to get this wrapped up before the heat death of the universe arrives, though don't have much time outside of weekends to work on it, so it may be a couple of weeks out still.

Good look with the heat 🥵 🔥
No hurry really 😉

@Lucas-C
Copy link
Member

Lucas-C commented Aug 30, 2021

@torque Is it all going OK? 😊
Any help needed?

@torque
Copy link
Author

torque commented Sep 4, 2021

I think the main thing that's left on this is improving the test coverage, some minor tweaks to some rough edges of the API, and maybe some refinement of the documentation. The last couple of weeks ended up being busier than I expected, but I think I'll have some time to try to put a bow on it this weekend.

@alexp1917
Copy link
Collaborator

@torque if you want to switch to a policy of not force-pushing i can try to help write some of the tests - no promises that I'll understand what youre doing but i can just aim for maximum line/branch coverage. we probably shouldn't attempt to do that at the same time but its a three day weekend for me due to US Labor Day, so i can def find some time monday (and onwards)

@torque
Copy link
Author

torque commented Sep 5, 2021

Hey @alexp1917, I appreciate the offer to help write tests, I think that'll be a big help in getting this ready to merge. I have almost all of the API tweaks I wanted finished and docstrings on (I think) all the public APIs and most of the internal ones, so I am happy to stop force pushing (I've got one more change in mind that I will probably rebase in tonight).

Hopefully the documentation that I have written so far will help you to understand the API and help inform the test-writing process (and if it doesn't, that's a good sign the documentation would benefit from further improvement).

@torque
Copy link
Author

torque commented Sep 5, 2021

Okay, I think I have everything cleared up except for writing the tests and addressing any bugs that appear as a result of improved test coverage. I won't do any more force pushing until we have this in a state where this is ready to merge. @alexp1917, how do you want to coordinate on writing the tests?

@alexp1917
Copy link
Collaborator

@torque i have some free time today, so ill see what i can get done, i suppose

@alexp1917
Copy link
Collaborator

alexp1917 commented Sep 7, 2021

im seeing 14 failed tests? ok, i guess i can try to fix those, or maybe go for diff coverage...

but

the tests only fail when run as part of whole suite

@alexp1917
Copy link
Collaborator

alexp1917 commented Sep 8, 2021

ok - I've gotten myself acquainted with the code - suffice it to say that this will be an uphill battle. I'm not defeated yet, but at the same I think its a requirement that we print the same pdf with the same input arguments to the FPDF instance -- if only because the tests are structured this way. the tests are failing because of the global cache. I think this cache should be removed but I don't have time to do this today and am still investigating how to localize this cache to the FPDF instance. input is appreciated but if i dont hear back ill figure it out. is it realistic for me to spend 3 hours a day on this? no, but i will aim for that this week...

@Lucas-C I am actually curious if you agree with the italicized part, or rather, i hope you do, and wonder to what extent.

changes are here https://github.com/PyFPDF/fpdf2/commits/path-drawing_local_cache -- but they still fail in the same exact way - as if there is still some caching and ive changed nothing. feels_insane_man.jpg

@Lucas-C
Copy link
Member

Lucas-C commented Sep 8, 2021

I think its a requirement that we print the same pdf with the same input arguments to the FPDF instance -- if only because the tests are structured this way. the tests are failing because of the global cache. I think this cache should be removed

Are you refering to drawing._global_style_name_lookup?

I agree that global state should be avoided, if that is the question 😊

@torque
Copy link
Author

torque commented Sep 8, 2021

Ah, yes, I had been running the SVG and drawing tests independently, so I didn't catch that they fail when run in the same testing run. Whoops, that's a big oversight.

The naïve option for eliminating the global cache is to only deduplicate the graphics state context dictionaries per DrawingContext which will probably result in a lot of duplicates graphics state dictionaries being emitted in a document that uses more than one drawing. The better option here is probably to pass the current FPDF document to the DrawingContext and move the trivial graphics state dictionary lookup API onto the FPDF object.

I think the latter change should be safe and simple, I'll prototype it tonight.

@torque
Copy link
Author

torque commented Sep 9, 2021

@alexp1917 thanks for catching that bad behavior. I've reimplemented the graphics state registry to match the lifetime of an fpdf.FPDF object, which I think gives it a good document-level scope, and it also fixes the problems with the tests. I updated the rendered PDFs for the existing tests with the new behavior and the output still all looks good on my end and, importantly, the tests now appear to work regardless of the order I run them in.

@torque
Copy link
Author

torque commented Sep 15, 2021

@alexp1917 have you made any progress on the unit tests? I'm probably going to start spending some time on those in the evenings so we can hopefully get this merged some time this month.

@alexp1917
Copy link
Collaborator

@torque sorry ive been MIA for this - really rough week as far as workload and not sure next one will be better. this month is a good goal and I think it is doable. I want to merge this more than I can help, so i guess i can compromise.

These weren't getting converted to numbers, whoops.
This is an SVG 1.1 feature. It doesn't appear to be in SVG 2 which uses
CSS transforms. We don't distinguish between SVG 1.1 and SVG 2, and I
don't even know if SVG 2 is particularly well supported by anything.
Also add a couple of easily supportable ones that were missing.
This is a mess and it's confusing to me but this approach appears to
work.
Apparently this doesn't match the default PDF style.
This will hopefully make it easier to understand the tutorial.
Because these are coupled in the PDF serialization process, there's
some annoying jogging around that has to be done to make this work,
but it should hopefully make the API more intuitive to use. This also
makes some changes to the plumbing of the drawing.GraphicsStyle class
to make it more subclass friendly.
I honestly have no clue how pylint managed to propagate ellipsis to
this code, but miss that this if clause above specifically checks for
ellipsis and handles it separately. But also doesn't propagate it down
to the `render_list.append(intersection_rule.value)` call below.

Because this is a mystery, I hope this fix doesn't randomly fail on CI
again.
@torque
Copy link
Author

torque commented Jan 9, 2022

@Lucas-C I have rebased it on top of latest master. My local checks look good, though there's obviously room for error there :)

@Lucas-C Lucas-C merged commit 5b425c4 into py-pdf:master Jan 10, 2022
@Lucas-C
Copy link
Member

Lucas-C commented Jan 10, 2022

Aaaaaand MERGE!
Wow, it's done! 🥳
Congratulations on this massive work @torque.
You've been diligent to keep working on this PR for over 5 months, thank you very much 😲

@torque
Copy link
Author

torque commented Jan 11, 2022

huzzah!
@Lucas-C thank you for your diligent maintainership of this project, it made it easy to commit time and energy into a larger change like this, knowing it wouldn't end up abandoned at the other end.

@Lucas-C
Copy link
Member

Lucas-C commented Jan 22, 2022

This has now been released as part of v2.5.0

@Lucas-C
Copy link
Member

Lucas-C commented Apr 24, 2022

Hi @torque!
Just a heads-up that I mentioned your work in a recent blog article:
https://chezsoi.org/lucas/blog/fpdf2-5-2-svg-support-and-borb.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants