-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long HTML list entries are not correctly indented #1073
Comments
@gmischler: this seems related to the text regions / paragraph rendering logic: would you like to have a look at this? 🙂 |
I think it has always behaved like this. Of course, that's no reason to keep it that way... There are several possible solutions to this:
I'm currently still busy with other stuff, but I don't think this should be very hard if anyone wants to try. |
There must be a reason, however I am not aware and maybe either of you could enlighten me.
This solves all of these issues and I have been using this code without encountering any issues* so far, however as I said maybe there is something I am not aware off that makes this a bad idea. *It does require modification of li elements so that it continues to use the current indent method when not inside a list. |
The cleaner solution than manipulating the global l_margin would be to give |
Hello. I want to start contributing to the project. Is the issue still open? If so, what needs to be done? |
At first glance you will need to modify Paragraph (include indent parameter) and TextRegion on text_region.py, and do some modifications on html.py to add the indent value on each paragraph. On Paragraph you will need to add the indent value to the margins when calling MultiLineBreak and on TextRegion you need to apply the indent value to modify pdf.x before calling render_styled_text_line |
Apologies that it's taking me so long. I could not dedicate as much time as I intended on this project, but I seem to have finally internalised what the relevant parts of the code do. I have partially implemented the suggested changes, and now I have two questions:
|
Do not worry for the time it takes you, contributing to this project is voluunteer work, we all just do our best with the time we can dedicate to it 🙂 First, not that a first implementation of this tag indentation has been be added recently, in this PR: #1124
It also seems like the best behaviour to me!
I'm not exactly sure of what you experimented exactly. You could open a draft PR with the changes you attempted, so that we can better answer your question "Is that behaviour expected?" 🙂 |
I actually did forget that that should be accounted for. Thank you for reminding me.
I have tried both editing the values of
That question is more general, as I don't see the difference with or without the changes introduced by me. Initially, I did the experimentation with the current EDIT: Apologies for disappearing - I got sick. I am better now and will be resuming work on the project. |
Current thoughts:
I think I rubber-duckied myself out of the other questions and concerns that I had for now. EDIT: And now I got problems with my OS becoming unresponsive on login. I will have to deal with that first. |
EDIT2: apologies, this is me being dumb. I did not notice that the relevant files have multiple pages. I see the issues now. I seem to have implemented the feature, however, some of the old tests are not being passed right now. Visually, I am not noticing any difference, but the hash of the files does differ. qpdf is installed and its
There are also some jpype-related failed tests, but they fail without any changes done by me, as well. I will try to examine what the issues are, but I would like to ask for help here. The relevant code is in the Also, nested lists do not behave correctly, which seems to be caused by new paragraphs starting with a newline, meaning that a list nested within a previous one will be shifted by one line down from where it should be normally. I will be submitting a ticket shortly. EDIT: qpdf seems to work now, but the issues persist. I am not noticing any differences so far:
|
I'm currentlty on holidays, but unless @gmischler or @andersonhc beats me to it, I'll get back to you on Monday 🙂 |
I think I have a good idea of what the problem in my code is and will be solving it later tonight. I will likely need to design a way of checking if a relevant fragment of a paragraph is an 'indentation element', i.e. that it is a fragment which sets the indentation for the paragraph. The problem is that what I am checking seems to be a fragment with just a newline element, such as at the beginning of a paragraph, instead of things that precede HTML list items. |
I think supporting different "indentation elements" within a single paragraph makes things unnecessarily complicated. I recommend to only set a fixed indent and an optional bullet/numbering symbol when creating a paragraph, and never to change those while writing to it. Btw: This is essentially also the way we're used to think about paragraphs on paper: A homogenous block of text. |
An 'indentation element' in this context would be something like this string When the relevant fragment is written into the paragraph, it is marked as an 'indentation element' via a new attribute of the class Later, in My intention with having some fragments be identified as 'indentation elements' is to make it so that parts of the code that would dictate relevant indentation could be edited without the need to touch the code that would calculate indents for lines and paragraphs. In the case I will finish this code the way I see it, it should be easy to generalise for the other HTML elements that feature indentation.
A problem that I see with assigning indentation on paragraph creation in the cases of starting tags, such as
I did want to ask for a permission to at least make it so that each |
This is exactly what we want to get rid of here. The paragraph should only receive the bullet and place that correctly during rendering, depending on the indent it has been given.
Firstly, I don't think there is any need to modify the Fragment code. That would only make things unnecessarily complicated.
You don't need a special permission to fix bad code. Turning the |
I am implementing the indentation feature, with the input from the last reply taken into account. However, I want to ask for some advice:
Currently, a piece of relevant code of mine in if first_line and bullet:
if text_line.align == Align.R:
dx = w - l_c_margin - styled_txt_width
elif text_line.align in [Align.C, Align.X]:
dx = (w - styled_txt_width) / 2
else:
dx = l_c_margin
bullet_normalized_string = self.normalize_text(bullet).replace("\r", "")
# TODO: fix the following line
bullet_fragment = self._preload_font_styles(bullet_normalized_string, markdown=False)
bullet_width = bullet_fragment.get_width()
sub_indent = 1
bullet_text = bullet_fragment.render_pdf_text(
0,
current_ws,
0,
self.x - bullet_width - sub_indent + dx + s_width + indent,
self.y + (0.5 * h + 0.3 * max_font_size),
self.h,
)
if bullet_text:
sl.append(bullet_text) |
I will try to make a pull request tomorrow or a day after that. I will need to restructure some code, so that a bullet would be a separate attribute of a paragraph, of a Unless I get further input, I will be setting default tag indents and default indentation between bullets and corresponding lines to match the old indentation. I also have another question, but that falls under another topic. |
Apologies for the delay. Had issues with both health and my primary IDE. I have fixed the issue for
edited by gmischler: fixed formatting |
It took me a while to figure out what you mean by that, but I think you're doing the right thing here. Although I'm not sure if
That sounds like what we want.
Your goal is to improve on the behaviour of the current code. If you manage that, then of course it is not only acceptable but necessary to adapt the test files to the new and more correct result.
The way that we currently handle line heights in HTML is in many places rather random. If you can make it more systematic and predictable, by any means do so.
This example is horribly invalid HTML. I recommend to only worry about the rendering of standards conforming HTML.
Despite its HTML defects, your snippet should result in 4 seperate paragraphs (if dt/dd are handled correctly). In our code,
You do NOT want to modify Setting up the temporarily modified environment (current x/y etc.) correctly, creating a Consider for example that in the future, we might want an option to place the bullet vertically centered next to the paragraph instead of at the top, and the need to keep things localized and seperate from each other will become much more obvious. |
I got that from
Seems to be rendered elsewhere, but I will adhere to the provided clarification.
Alright. Will correct my code. |
Seems like I am mostly done. The current version of my code is in the What is left primarily is for me to go over the test conflicts again, and to make new ones. Also, I did introduce a |
So it is based on arbitrary character widths of an arbitrary font. Can you use something more straightforward like 8 or 10 mm?
Browsers will make a best-effort attempt at rendering even the most horrible tag soup, which is one reason why the rendering engines have become such bloated monsters nowadays.
Cool!
You'll have to do "something" with that when the
We don't currently have a very stringent concept of what stuff should be proteced and what shouldn't. |
Apologies - I got sick again. Will try to finish with the tests tomorrow.
|
Describe the bug
When list enties which are larger than a line length, and wrap they are not indented.
Minimal code
Environment
Please provide the following information:
fpdf2
version used:2.7.7The text was updated successfully, but these errors were encountered: