Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indent HTML lists correctly (Issue 1073) #1170

Merged
merged 45 commits into from
Jun 15, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
f794193
intermediate commit to save progress. Debugging needed.
lcgeneralprojects May 5, 2024
20c035e
Feature mostly implemented.
lcgeneralprojects May 5, 2024
eb93711
Fixed the issue with indentation of nested lists.
lcgeneralprojects May 6, 2024
f8f17a5
Feature implemented.
lcgeneralprojects May 11, 2024
77a1a31
Feature implemented.
lcgeneralprojects May 11, 2024
e80d8d9
Feature implemented for <li>.
lcgeneralprojects May 12, 2024
9bddeca
Feature implemented for <li>.
lcgeneralprojects May 12, 2024
dbcce1f
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 13, 2024
fb59849
Issue mostly fixed.
lcgeneralprojects May 16, 2024
bc1fab8
Issue fixed.
lcgeneralprojects May 16, 2024
d487f7d
Changed `<ol>` bullets to not introduce an extra whitespace.
lcgeneralprojects May 16, 2024
2caa750
Added the `li_pseudo_margin`attribute to `HTML2FPDF`.
lcgeneralprojects May 17, 2024
ceaa6d3
Added the `list_pseudo_margin`attribute to `HTML2FPDF`.
lcgeneralprojects May 17, 2024
070a41d
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 17, 2024
4ab204e
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 19, 2024
7b8923c
Fixed the inappropriate `TextMode` importation.
lcgeneralprojects May 19, 2024
1e1eb29
Fixed the inappropriate `TextMode` importation.
lcgeneralprojects May 19, 2024
dc3d8f8
Merge remote-tracking branch 'origin/issue_1073' into issue_1073
lcgeneralprojects May 19, 2024
3f56811
Introduced new test `test_html_long_list_entries`.
lcgeneralprojects May 20, 2024
ce7cb9b
Adjusted `Changelog.md` and relevant docstrings.
lcgeneralprojects May 20, 2024
24626f9
Changed the name of the relevant variables from `list_top_margin` to …
lcgeneralprojects May 25, 2024
208e3b3
Adjusted html code strings in `test_hmtl_long_ol_bullets` for aesthet…
lcgeneralprojects May 25, 2024
8cceb1d
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 25, 2024
bf5f0fa
Added `self.pdf.normalize_text(bullet_string)` to `Paragraph.generate…
lcgeneralprojects May 25, 2024
82fbfda
Added `self.pdf.normalize_text(bullet_string)` to `Paragraph.generate…
lcgeneralprojects May 25, 2024
a75a948
Merge remote-tracking branch 'origin/issue_1073' into issue_1073
lcgeneralprojects May 25, 2024
fc38846
Adjusted handling of `fragment`s in the `Paragraph.generate_bullet_fr…
lcgeneralprojects May 25, 2024
2f69001
Added docstring to `Paragraph`.
lcgeneralprojects May 26, 2024
f7908e4
Used `black` on `html.py`
lcgeneralprojects May 26, 2024
5afb935
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 27, 2024
4e7118b
Merged changes to the branch `master` into the branch `issue_1073`.
lcgeneralprojects May 27, 2024
1ead6a3
Introduced conversion of 'magic numbers', and default tag indent and …
lcgeneralprojects May 31, 2024
619c250
Merge branch 'master' into issue_1073
lcgeneralprojects Jun 6, 2024
418f213
Introduced unit conversion for `li_tag_indent`.
lcgeneralprojects Jun 6, 2024
c6c8d8b
Updated test files
lcgeneralprojects Jun 6, 2024
18dddd7
Renamed `bullet_rel_x_displacement`, `bullet_rel_y_displacement` and …
lcgeneralprojects Jun 6, 2024
b345631
Undone changes to handling non-default values for `li_tag_indent` and…
lcgeneralprojects Jun 6, 2024
fb3305b
Requested changes to conversion of default values implemented.
lcgeneralprojects Jun 9, 2024
cc7f247
Changes to `test_html_measurement_units`.
lcgeneralprojects Jun 9, 2024
37a8d81
Adjusted `CHANGELOG.md`.
lcgeneralprojects Jun 11, 2024
f6eb85b
Added the 'bullet_r_margin' parameter to `ParagraphCollectorMixin.par…
lcgeneralprojects Jun 12, 2024
d847eb8
Merge branch 'master' into issue_1073
lcgeneralprojects Jun 14, 2024
14ddcc4
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects Jun 15, 2024
691ad2b
Merged changes from `master`.
lcgeneralprojects Jun 15, 2024
2c475ab
Update TextRegion.md
gmischler Jun 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,13 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
### Fixed
* [`fpdf.drawing.DeviceCMYK`](https://py-pdf.github.io/fpdf2/fpdf/drawing.html#fpdf.drawing.DeviceCMYK) objects can now be passed to [`FPDF.set_draw_color()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_draw_color), [`FPDF.set_fill_color()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_fill_color) and [`FPDF.set_text_color()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_text_color) without raising a `ValueError`: [documentation](https://py-pdf.github.io/fpdf2/Text.html#text-formatting).
* individual `/Resources` directories are now properly created for each document page. This change ensures better compliance with the PDF specification but results in a slight increase in the size of PDF documents. You can still use the old behavior by setting `FPDF().single_resources_object = True`.
* Fixed incoherent indentation of long list entries - _cf._ [issue #1073](https://github.com/py-pdf/fpdf2/issues/1073)

### Changed
* [`FPDF.table()`](https://py-pdf.github.io/fpdf2/Tables.html) now raises an error when a single row is too high to be rendered on a single page
* `HTML2FPDF.handle_starttag()` now generates one `Paragraph` object for every `<li>` HTML element. Margins above lists are now handled with `<ul>` and `<ol>` tags.
gmischler marked this conversation as resolved.
Show resolved Hide resolved
* `HTML2FPDF.tag_indents` can now be non-integer. Indentation of HTML elements is now independent of font size and bullet strings.

### Deprecated

## [2.7.10] - 2024-05-18
Expand All @@ -43,7 +48,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
* a bug when rendering vector images with dashed lines that caused a warning message in Adobe Acrobat Reader
* ordering RTL fragments on bidirectional texts
* fixed type hint of member `level` in class [`OutlineSection`](https://py-pdf.github.io/fpdf2/fpdf/outline.html#fpdf.outline.OutlineSection) from `str` to `int`.
* SVG clipping paths being incorrectly painted - _cf._ [issue #1147](https://github.com/py-pdf/fpdf2/issues/1147)]
* SVG clipping paths being incorrectly painted - _cf._ [issue #1147](https://github.com/py-pdf/fpdf2/issues/1147)
* new translation of the tutorial in [Polski](https://py-pdf.github.io/fpdf2/Tutorial-pl.html) - thanks to @DarekRepos
### Changed
* improved the performance of `FPDF.start_section()` - _cf._ [issue #1092](https://github.com/py-pdf/fpdf2/issues/1092)
Expand Down
4 changes: 4 additions & 0 deletions docs/TextRegion.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ For more typographical control, you can use the following arguments. Most of tho
* line_height (float, optional) - factor by which the line spacing will be different from the font height. (default: by region)
* top_margin (float, optional) - how much spacing is added above the paragraph. No spacing will be added at the top of the paragraph if the current y position is at (or above) the top margin of the page. (Default: 0.0)
* bottom_margin (float, optional) - Those two values determine how much spacing is added below the paragraph. No spacing will be added at the bottom if it would result in overstepping the bottom margin of the page. (Default: 0.0)
* indent (float, optional): determines the indentation of the paragraph. (Default: 0.0)
* bullet_rel_x_displacement (float, optional) - determines the relative displacement of the bullet along the x-axis. The distance is between the rightmost point of the bullet to the leftmost point of the paragraph's text. (Default: 2.0)
gmischler marked this conversation as resolved.
Show resolved Hide resolved
* bullet_rel_y_displacement (float, optional) - determines the relative displacement of the bullet along the y-axis. The distance is between the topmost point of the bullet and the topmost point of the paragraph's text. (Default: 0.0)
gmischler marked this conversation as resolved.
Show resolved Hide resolved
* bullet_string (str, optional): determines the fragments and text lines of the bullet. (Default: "")
* skip_leading_spaces (float, optional) - removes all space characters at the beginning of each line.
* wrapmode (WrapMode, optional)

Expand Down
85 changes: 63 additions & 22 deletions fpdf/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@
}
DEFAULT_TAG_INDENTS = {
"blockquote": 0,
"dd": 10,
"li": 5,
"dd": 30,
"li": 30,
}

# Pattern to substitute whitespace sequences with a single space character each.
Expand Down Expand Up @@ -270,8 +270,8 @@ def __init__(
self,
pdf,
image_map=None,
li_tag_indent=5,
dd_tag_indent=10,
li_tag_indent=30,
gmischler marked this conversation as resolved.
Show resolved Hide resolved
dd_tag_indent=30,
table_line_separators=False,
ul_bullet_char=BULLET_WIN1252,
li_prefix_color=(190, 0, 0),
Expand All @@ -280,6 +280,7 @@ def __init__(
warn_on_tags_not_matching=True,
tag_indents=None,
tag_styles=None,
list_vertical_margin=None,
**_,
):
"""
Expand All @@ -296,8 +297,11 @@ def __init__(
heading_sizes (dict): [**DEPRECATED since v2.7.9**] font size per heading level names ("h1", "h2"...) - Set tag_styles instead
pre_code_font (str): [**DEPRECATED since v2.7.9**] font to use for <pre> & <code> blocks - Set tag_styles instead
warn_on_tags_not_matching (bool): control warnings production for unmatched HTML tags
tag_indents (dict): mapping of HTML tag names to numeric values representing their horizontal left identation
tag_indents (dict): mapping of HTML tag names to numeric values representing their horizontal left identation.
The indent values are in the chosen pdf document units.
tag_styles (dict): mapping of HTML tag names to colors
list_vertical_margin (float): size of margins that precede lists.
The margin value is in the chosen pdf document units.
"""
super().__init__()
self.pdf = pdf
Expand Down Expand Up @@ -334,8 +338,11 @@ def __init__(
self.align = ""
self.style_stack = [] # list of FontFace
self.indent = 0
self.line_height_stack = []
self.ol_type = [] # when inside a <ol> tag, can be "a", "A", "i", "I" or "1"
self.bullet = []
if list_vertical_margin is None:
self.list_vertical_margin = 0.3 / self.pdf.k
gmischler marked this conversation as resolved.
Show resolved Hide resolved
self.font_color = pdf.text_color.colors255
self.heading_level = None
self.heading_above = 0.2 # extra space above heading, relative to font size
Expand All @@ -352,7 +359,7 @@ def __init__(
# "inserted" is a special attribute indicating that a cell has be inserted in self.table_row

if not tag_indents:
tag_indents = {}
tag_indents = {k: v / self.pdf.k for k, v in DEFAULT_TAG_INDENTS.items()}
gmischler marked this conversation as resolved.
Show resolved Hide resolved
if dd_tag_indent != DEFAULT_TAG_INDENTS["dd"]:
gmischler marked this conversation as resolved.
Show resolved Hide resolved
warnings.warn(
(
Expand Down Expand Up @@ -420,7 +427,13 @@ def __init__(
)

def _new_paragraph(
self, align=None, line_height=1.0, top_margin=0, bottom_margin=0
self,
align=None,
line_height=1.0,
top_margin=0,
bottom_margin=0,
indent=0,
bullet="",
):
self._end_paragraph()
self.align = align or ""
Expand All @@ -432,6 +445,8 @@ def _new_paragraph(
skip_leading_spaces=True,
top_margin=top_margin,
bottom_margin=bottom_margin,
indent=indent,
bullet_string=bullet,
)
self.follows_trailing_space = True
self.follows_heading = False
Expand Down Expand Up @@ -545,10 +560,20 @@ def handle_starttag(self, tag, attrs):
parse_style(attrs)
self._tags_stack.append(tag)
if tag == "dt":
self._write_paragraph("\n")
self._new_paragraph(
gmischler marked this conversation as resolved.
Show resolved Hide resolved
line_height=(
self.line_height_stack[-1] if self.line_height_stack else None
),
)
tag = "b"
if tag == "dd":
self._write_paragraph("\n" + "\u00a0" * self.tag_indents["dd"])
self.follows_heading = True
self._new_paragraph(
line_height=(
self.line_height_stack[-1] if self.line_height_stack else None
),
indent=self.tag_indents["dd"] * (self.indent + 1),
)
if tag == "strong":
tag = "b"
if tag == "em":
Expand Down Expand Up @@ -659,38 +684,48 @@ def handle_starttag(self, tag, attrs):
size=tag_style.size_pt or self.font_size,
)
self.indent += 1
self._new_paragraph(top_margin=3, bottom_margin=3)
if self.tag_indents["blockquote"]:
self._write_paragraph("\u00a0" * self.tag_indents["blockquote"])
self._new_paragraph(
top_margin=9 / self.pdf.k,
bottom_margin=9 / self.pdf.k,
indent=self.tag_indents["blockquote"] * self.indent,
)
if tag == "ul":
self.indent += 1
bullet_char = (
ul_prefix(attrs["type"]) if "type" in attrs else self.ul_bullet_char
)
self.bullet.append(bullet_char)
line_height = None
if "line-height" in attrs:
try:
# YYY parse and convert non-float line_height values
line_height = float(attrs.get("line-height"))
self.line_height_stack.append(float(attrs.get("line-height")))
except ValueError:
pass
self._new_paragraph(line_height=line_height)
else:
self.line_height_stack.append(None)
if self.indent == 1:
self._new_paragraph(top_margin=self.list_vertical_margin, line_height=0)
self._write_paragraph("\u00a0")
self._end_paragraph()
if tag == "ol":
self.indent += 1
start = int(attrs["start"]) if "start" in attrs else 1
self.bullet.append(start - 1)
self.ol_type.append(attrs.get("type", "1"))
line_height = None
if "line-height" in attrs:
try:
# YYY parse and convert non-float line_height values
line_height = float(attrs.get("line-height"))
self.line_height_stack.append(float(attrs.get("line-height")))
except ValueError:
pass
self._new_paragraph(line_height=line_height)
else:
self.line_height_stack.append(None)
if self.indent == 1:
self._new_paragraph(top_margin=self.list_vertical_margin, line_height=0)
self._write_paragraph("\u00a0")
self._end_paragraph()
if tag == "li":
self._ln(2)
self._ln(6 / self.pdf.k)
self.set_text_color(*self.li_prefix_color)
if self.bullet:
bullet = self.bullet[self.indent - 1]
Expand All @@ -701,9 +736,14 @@ def handle_starttag(self, tag, attrs):
bullet += 1
self.bullet[self.indent - 1] = bullet
ol_type = self.ol_type[self.indent - 1]
bullet = f"{ol_prefix(ol_type, bullet)}. "
indent = "\u00a0" * self.tag_indents["li"] * self.indent
self._write_paragraph(f"{indent}{bullet} ")
bullet = f"{ol_prefix(ol_type, bullet)}."
self._new_paragraph(
line_height=(
self.line_height_stack[-1] if self.line_height_stack else None
),
indent=self.tag_indents["li"] * self.indent,
bullet=bullet,
)
self.set_text_color(*self.font_color)
if tag == "font":
# save previous font state:
Expand Down Expand Up @@ -902,6 +942,7 @@ def handle_endtag(self, tag):
self.indent -= 1
if tag == "ol":
self.ol_type.pop()
self.line_height_stack.pop()
self.bullet.pop()
if tag == "table":
self.table.render()
Expand Down
Loading
Loading