-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write() refactor to use new line wrapping code #346
Conversation
*Values of csv files are converted by position, instead of content * Updated tests to check for regression * Updated documentation and tests to include multiline text.
restrict decimal seperator replacement to float fields
I had wondered why those tests failed on my system, and assumed it was because I have different versions of these fonts installed. |
Heh, now I had a closer look... Those tests use ...which now gets eaten and never appears in the file. And to make things worse, eliminating that character causes all following characters to be indexed differently, which means the files are not different by just one character, but actually in most of the lines. What is the best solution here? Edit: I went with the second option. It was easy enough to implement. |
Codecov Report
@@ Coverage Diff @@
## master #346 +/- ##
==========================================
+ Coverage 90.58% 90.85% +0.26%
==========================================
Files 20 20
Lines 5885 5936 +51
Branches 1182 1199 +17
==========================================
+ Hits 5331 5393 +62
+ Misses 331 320 -11
Partials 223 223
Continue to review full report at Codecov.
|
I'm reviewieng the code right now. |
fpdf/__init__.py
Outdated
@@ -39,6 +41,8 @@ | |||
"__license__", | |||
# Classes | |||
"FPDF", | |||
"X", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really short for a class name...
It's not good if you read from fpdf import X, Y
and you really don't know what are those X
& Y
objects!
I suggest to rename those enums XAlign
& YAlign
.
What do you think @gmischler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. They're not really alignments, though...
XPos
& YPos
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm fine with that!
fpdf/fpdf.py
Outdated
) | ||
|
||
def _render_styled_cell_text( | ||
self, | ||
text_line, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter class (TextLine
) could be added as annotation here, to improve readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "annotation"?
It is mentioned in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant type hints, like this:
def _render_styled_cell_text(
self,
text_line : TextLine,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I've never done that before...
I'll figure it out!
fpdf/line_break.py
Outdated
@@ -75,6 +76,8 @@ def __init__(self): | |||
# SpaceHint is used fo this purpose. | |||
# 3 - position of last inserted soft-hyphen | |||
# HyphenHint is used fo this purpose. | |||
# If print_sh=True, soft-hyphen is treated as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this big comment be converted to a docstring,
so that it gets rendered in the API docs?
https://pyfpdf.github.io/fpdf2/fpdf/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this class is part of the public API, and most of the comment is about its internals.
But the part I added about the print_sh
parameter should indeed go into a docstring.
fpdf/line_break.py
Outdated
# HYPHEN is inserted instead of SOFT_HYPHEN | ||
character = HYPHEN | ||
return self.size_by_style(character, style) | ||
|
||
def get_line_of_given_width(self, maximum_width): | ||
# pylint: disable=too-many-return-statements | ||
def get_line_of_given_width(self, maximum_width, no_wordsplit=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters with "no" in their names should be avoided, regarding code readability...
Could we replace this by wordsplit=True
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named it this way because I like it when the default for an unusual condition is False... 😉
But yeah, wordsplit=True
is probably easier on the eyes.
fpdf/fpdf.py
Outdated
# Font styles preloading must be performed before any call to FPDF.get_string_width: | ||
txt = self.normalize_text(txt) | ||
styled_txt_frags = self._preload_font_styles(txt, markdown) | ||
return self._render_styled_cell_text( | ||
TextLine(styled_txt_frags, 0.0, 0, False), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text_width
, number_of_spaces_between_words
& justify
parameters could be passed by name here, to improve readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
fpdf/fpdf.py
Outdated
@@ -2446,6 +2530,8 @@ def multi_cell( | |||
max_line_height (int): optional maximum height of each sub-cell generated | |||
markdown (bool): enable minimal markdown-like markup to render part | |||
of text as bold / italics / underlined. Default to False. | |||
print_sh (bool): Treat a soft-hyphen (\\u00ad) as a normal printable | |||
character, instead of a line breaking opportunity. Default value: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a detail: could you align the start of this costring line to match the "of text as bold" line above please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vim getting confused about something... will do.
fpdf/fpdf.py
Outdated
Possible values are: `L` or empty string: left align (default value) ; | ||
`C`: center ; `R`: right align | ||
newpos_x (Enum X): New current position in x after the call. | ||
X.LEFT - left end of the cell |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't those details about all possible values be moved into docstrings in the enums definitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the Enums are meant for (eventual) public consumption...
Yeah, sounds like a good idea.
fpdf/fpdf.py
Outdated
# adjustment before each space | ||
if self.ws and self.unifontsubset: | ||
|
||
word_spacing = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the relation of this variable with self.ws
?
Could a short comment be added here please, explaning things a little?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
Hard to tell by eye. I was hoping that the new algorithm is just particularly efficient... 😉 On closer inspection, it does seem that the right margin is about one mm (~= Edit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Great work there @gmischler, thank you.
Ping me when you'd like this PR to be merged 😊
Also, I have just merged #347 so you will have to rebase |
I've noticed... 😉 |
Ok, I think this is ready to be merged now. I changed And after black had annoyed me about it all that time, I finally gave in and let it fix the (entirely unrelated) spacing on two lines of drawing.py. There's an error with black in the 3.10 tests, but unfortunately it doesn't tell us what it thinks is wrong. Black doesn't complain here locally, so what's this about? |
I made some tests with
|
I think you should leave |
Weird. Oh, and "ping", @Lucas-C 🔔 |
Fixes #340
Checklist:
meaning that both
pylint
(static code analyzer) andblack
(code formatter) are happy with the changes of this PR.docs/
folderCHANGELOG.md
After changing
.write()
to use the new line wrapping code, it now handles soft hyphens.Besides
.write()
itself, changes were necessary two other methods:._render_styled_cell_text()
needed to learn how to set the current position to the end of the rendered text (for.write()
really a bit left of the actual end).MultiLineBreak.get_line_of_given_width()
ends up with a single word taking more space than the whole line, it just splits it apart unceremonously (with not even a hyphen). Since.write()
often starts on an already partially populated line where the remaining available space may be very short, this could happen frequently but is not desired in this context. I added an optionno_wordsplit=False
to prevent that when set to True. If the current line has characters added but no break hint, then it will rewind its internal state and return an empty line, causing.write()
to switch to a new line.I have refactored the purely internal
._render_styled_cell_text()
to usenewpos_x=X.RIGHT
andnewpos_y=Y.TOP
in place ofln=0
. This opens the way for many other options.Not all of those have an immediately obvious use case, but they're essentially free and someone might find them practical for their purposes.
This change demonstrates that the concept works very nicely and intuitively. The next step will be to add them to the API of the public methods and deprecating
ln=#
(along withcenter=
) in a follow-up PR.There used to be no tests whatsoever for
.write()
itself, it only got tested incidentally because it is used by html.py. I've written some tests to verify the basic functionality, as well as that it handles page breaks and soft hyphens correctly.At that opportunity, I've renamed the directory
test/cells
totest/text
and collected all text related tests in there.Additionally I created tests to verify that '._render_styled_cell_text()' determines the new positions correctly, since not all of them are currently exposed to the public API.
To make the code more flexible for future additions, I had to move the logic that sets the word spacing for justified text from
.multi_cell()
to._render_styled_cell_text()
. While doing so, I managed to optimize the PDF file size a bit, by only emitting "Tw" commands when actually necessary, and shortening the explicit word positioning values for unicode fonts to three decimals (analog to "Tw").I also noticed that
._perform_page_break()
caused unnecessary "Tw" entries in the PDF. It makes sense to reset "Tw" to the default 0 at the end of each page so that each page starts with a clean slate (and other software can extract individual pages without causing trouble). But recreating a non-0 value at the beginning of the new page is pointless, since typically each line will use a different value anyway. I removed that part, resulting in cleaner output.The following PDFs needed to be changed to pass existing tests:
write_html()
produces a lot of spurious whitespace, which should probably get fixed...).