Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endless loop in table layout #660

Closed
JohannesMunk opened this issue Jul 30, 2018 · 21 comments
Closed

endless loop in table layout #660

JohannesMunk opened this issue Jul 30, 2018 · 21 comments
Labels
bug Existing features not working as expected
Milestone

Comments

@JohannesMunk
Copy link

Hello again!

Over the weekend I successfully boiled down another problem, that hunted us for a couple of weeks and prevents us outputting some files. As these files are pretty big, such a constellation as random it might seem, happens with a stubborn regularity.

This bug requires a specific layout situation with given widths and content. I hope it is reproducable on your side. On my side the following html does not convert, but instead weasyprint collects memory until it runs out of it, or the process is aborted..

<html lang="de">
<head>
	<meta charset="utf-8" />
	<style>
		body {
			margin: 1px;
			font-size: 9pt !important;
			font-family: Helvetica, Arial, sans-serif;
		}
		.options {
			column-count: 2;
			column-gap: 3.5em;
			margin-left: 1cm;
			margin-right: 2cm;
		}
		table {
			width: 100%;
			border-spacing: 0;
			border-collapse: collapse; 
			font-size: 0.9em !important;
		}
		table, td {
			padding: 0;	
			margin: 0;
		}
		@media print {
			@page {
				size: A4;
				margin: 0;
			}
		}
	</style>
</head>
<body>
  <div class="options">
    <table><tr>
      <td style="width:1.5em;"></td>
      <td>(<span><span>Standardeinstellung für Antriebs-/Lenk- Befehlscode - vorwärts,
        rückwärts, links, rechts</span></span>)</td>
      <td style="width: 7em;"></td>
    </tr></table>
  </div>
</body>
</html>

I let this run in cProfile:

 Ordered by: cumulative time

ncalls tottime cumtime filename:lineno(function)
   274   0.009  48.899 {built-in method builtins.exec}
     1   0.000  46.891 ..\weasyprint\__init__.py:148(write_pdf)
     1   0.000  46.891 ..\weasyprint\__init__.py:116(render)
     1   0.000  46.891 ..\weasyprint\document.py:306(_render)
     1   0.000  46.871 ..\weasyprint\document.py:334(<listcomp>)
     1   0.000  46.871 ..\weasyprint\layout\__init__.py:39(layout_document)
     1   0.000  46.871 ..\weasyprint\layout\pages.py:606(make_all_pages)
     1   0.000  46.868 ..\weasyprint\layout\pages.py:512(make_page)
     5   0.000  46.868 ..\weasyprint\layout\blocks.py:27(block_level_layout)
     5   0.000  46.868 ..\weasyprint\layout\blocks.py:80(block_box_layout)
     6   0.032  46.868 ..\weasyprint\layout\blocks.py:404(block_container_layout)
     1   0.000  46.867 ..\weasyprint\layout\blocks.py:123(columns_layout)
     1   0.000  46.827 ..\weasyprint\layout\tables.py:18(table_layout)
     1   0.000  46.827 ..\weasyprint\layout\tables.py:273(all_groups_layout)
     1   0.000  46.827 ..\weasyprint\layout\tables.py:241(body_groups_layout)
     1   0.000  46.827 ..\weasyprint\layout\tables.py:62(group_layout)
 28101   0.044  46.791 ..\weasyprint\layout\inlines.py:29(iter_line_boxes)
 28101   0.601  46.747 ..\weasyprint\layout\inlines.py:62(get_next_linebox)
112419   2.258  29.844 ..\weasyprint\text.py:910(split_first_line)
 28100   1.767  25.276 ..\weasyprint\layout\inlines.py:634(split_inline_box)
289541   0.500  24.180 {built-in method builtins.next}
 56201   0.353  22.892 ..\weasyprint\layout\inlines.py:555(split_inline_level)
 28102   0.069  17.947 ..\weasyprint\layout\preferred.py:179(inline_min_content_width)
 56214   0.878  17.648 ..\weasyprint\layout\preferred.py:220(inline_line_widths)
 56201   0.503  14.811 ..\weasyprint\layout\inlines.py:931(split_text_box)
758827  11.978  12.425 ..\weasyprint\text.py:665(iter_lines)
182684   1.038   9.204 ..\weasyprint\text.py:826(create_layout)
182685   3.935   6.636 ..\weasyprint\text.py:618(__init__)
112419   0.946   4.435 ..\weasyprint\text.py:580(first_line_metrics)
168616   1.271   4.023 ..\weasyprint\layout\percentages.py:59(resolve_percentages)
112403   0.154   3.299 ..\weasyprint\formatting_structure\boxes.py:322(copy_with_children)
...


Ordered by: call count

 ncalls tottime cumtime filename:lineno(function)
3738901   0.575   0.575 {built-in method builtins.isinstance}
3668880   0.642   0.643 {built-in method builtins.setattr}
2234161   0.568   0.568 ..\weasyprint\layout\percentages.py:15(_percentage)
2234161   1.288   2.208 ..\weasyprint\layout\percentages.py:32(resolve_one_percentage)
1447384   0.469   0.495 ..\cffi\api.py:171(_typeof)
1264699   1.089   2.350 ..\cffi\api.py:233(new)
1264699   0.687   0.687 {built-in method _cffi_backend.newp}
1060589   0.131   0.131 {built-in method builtins.len}
1029475   0.466   0.466 {method 'encode' of 'str' objects}
 815049   0.390   0.819 ..\cffi\api.py:404(gc)
 815049   0.428   0.428 {built-in method _cffi_backend.gcp}
 758828   0.343   0.343 {method 'replace' of 'bytes' objects}
 758828   0.719   2.740 ..\weasyprint\text.py:554(unicode_to_char_p)
 758827  11.978  12.425 ..\weasyprint\text.py:665(iter_lines)
 289541   0.500  24.180 {built-in method builtins.next}
 675484   0.336   0.336 {method 'format' of 'str' objects}
 477770   1.288   2.219 ..\weasyprint\text.py:564(get_size)
 451777   1.232   1.232 {built-in method __new__ of type object at 0x000000005797C430}
 449648   0.329   0.351 ..\weasyprint\formatting_structure\boxes.py:296(enumerate_skip)
 407501   0.058   0.058 {method 'append' of 'list' objects}
 365344   0.107   0.107 ..\weasyprint\formatting_structure\boxes.py:272(is_absolutely_positioned)
 337257   0.980   2.173 ..\weasyprint\text.py:674(set_text)
 309125   0.074   0.074 ..\weasyprint\formatting_structure\boxes.py:268(is_floated)
 301427   0.085   0.085 {method 'replace' of 'str' objects}
 289713   0.099   0.100 {method 'join' of 'str' objects}
 284618   0.195   0.195 {method 'decode' of 'bytes' objects}
 281001   0.083   0.083 ..\weasyprint\formatting_structure\boxes.py:136(padding_height)
 281001   0.138   0.222 ..\weasyprint\formatting_structure\boxes.py:145(border_height)
 224804   0.103   0.103 ..\weasyprint\formatting_structure\boxes.py:132(padding_width)
 224804   0.136   0.238 ..\weasyprint\formatting_structure\boxes.py:140(border_width)
 210752   0.047   0.047 ..\weasyprint\text.py:1194(<genexpr>)
 197870   0.128   0.128 {method 'split' of 'str' objects}
 196701   0.415   0.415 ..\weasyprint\css\computed_values.py:664(strut_layout)
 112400   0.155   0.171 ..\weasyprint\formatting_structure\boxes.py:115(translate)
 186049   0.258   0.403 {built-in method builtins.hasattr}
 184778   0.069   0.069 {method 'rpartition' of 'str' objects}
 183026   0.092   1.675 <frozen importlib._bootstrap>:997(_handle_fromlist)
 183378   0.146   0.215 <frozen importlib._bootstrap>:416(parent)
 182686   0.196   0.388 ..\cffi\api.py:284(cast)
 182686   0.085   0.085 {built-in method _cffi_backend.cast}
 182685   3.935   6.636 ..\weasyprint\text.py:618(__init__)
 182684   1.038   9.204 ..\weasyprint\text.py:826(create_layout)
 182683   0.523   0.523 ..\weasyprint\text.py:724(get_font_features)
 169267   0.277   0.277 {method 'update' of 'dict' objects}
  56214   0.878  17.648 ..\weasyprint\layout\preferred.py:220(inline_line_widths)
 168616   1.271   4.023 ..\weasyprint\layout\percentages.py:59(resolve_percentages)
 168609   0.071   0.071 ..\weasyprint\formatting_structure\boxes.py:160(content_box_x)
 168605   0.262   0.590 ..\weasyprint\formatting_structure\boxes.py:104(copy)
 156652   0.067   0.067 {method 'endswith' of 'str' objects}
 155313   0.025   0.025 {method 'extend' of 'list' objects}
 140501   0.091   0.243 ..\weasyprint\formatting_structure\boxes.py:150(margin_width)
  56201   0.353  22.892 ..\weasyprint\layout\inlines.py:555(split_inline_level)
 140499   0.575   1.991 ..\weasyprint\formatting_structure\boxes.py:305(_reset_spacing)
 126467   0.088   0.151 ..\weasyprint\formatting_structure\boxes.py:276(is_in_normal_flow)
 113112   0.046   0.046 {built-in method builtins.max}
 112419   0.946   4.435 ..\weasyprint\text.py:580(first_line_metrics)
 112419   2.258  29.844 ..\weasyprint\text.py:910(split_first_line)
 112410   0.096   0.255 ..\weasyprint\text.py:550(utf8_slice)
 112403   0.154   3.299 ..\weasyprint\formatting_structure\boxes.py:322(copy_with_children)
 112402   0.669   0.669 {method 'copy' of 'dict' objects}
  28101   0.157   0.188 ..\weasyprint\layout\inlines.py:179(skip_first_whitespace)
 112401   0.054   0.149 ..\weasyprint\formatting_structure\boxes.py:154(margin_height)
 112400   0.037   0.037 ..\weasyprint\layout\inlines.py:898(<listcomp>)
  28100   1.767  25.276 ..\weasyprint\layout\inlines.py:634(split_inline_box)
  98364   0.141   0.248 ..\pyphen\__init__.py:56(language_fallback)
  90402   0.042   0.042 {method 'rstrip' of 'str' objects}
  88412   0.046   0.046 {built-in method builtins.min}
  84550   0.029   0.029 {method 'strip' of 'str' objects}
  84320   0.271   0.271 ..\weasyprint\layout\preferred.py:125(margin_width)
  84318   0.150   0.223 ..\weasyprint\layout\preferred.py:110(min_max)
  84318   0.075   0.569 ..\weasyprint\layout\preferred.py:153(adjust)
  84300   0.028   0.028 ..\weasyprint\formatting_structure\boxes.py:182(border_box_y)
  84300   0.012   0.012 ..\weasyprint\formatting_structure\boxes.py:293(all_children)
  84300   0.102   2.213 ..\weasyprint\formatting_structure\boxes.py:418(_remove_decoration)
  84300   0.016   0.016 ..\weasyprint\layout\float.py:145(<listcomp>)
  84300   0.010   0.010 ..\weasyprint\layout\float.py:155(<listcomp>)
  84300   0.009   0.009 ..\weasyprint\layout\float.py:159(<listcomp>)
  84300   0.464   0.825 ..\weasyprint\layout\float.py:133(avoid_collisions)
  28100   0.272   0.518 ..\weasyprint\layout\inlines.py:1072(inline_box_verticality)
  28100   0.084   0.142 ..\weasyprint\layout\inlines.py:1229(is_phantom_linebox)
  29517   0.044   0.466 {built-in method builtins.any}
  56508   0.068   0.156 I:\x3rdParty\Python\lib\re.py:286(_compile)
  56237   0.008   0.008 {method 'reverse' of 'list' objects}
  56220   0.100   0.100 {method 'finditer' of '_sre.SRE_Pattern' objects}
  56208   0.044   0.211 I:\x3rdParty\Python\lib\re.py:224(finditer)
  56208   0.045   0.045 ..\weasyprint\text.py:1046(<listcomp>)
  56208   0.009   0.009 ..\weasyprint\text.py:1050(<listcomp>)
  56201   0.042   0.284 ..\weasyprint\formatting_structure\boxes.py:474(copy_with_text)
  56201   0.503  14.811 ..\weasyprint\layout\inlines.py:931(split_text_box)
  28100   0.018   0.439 ..\weasyprint\layout\inlines.py:1257(<genexpr>)
  14050   0.078   0.482 ..\weasyprint\layout\inlines.py:1245(can_break_inside)
  31266   0.006   0.006 {method 'startswith' of 'str' objects}
  28103   0.036   0.585 ..\weasyprint\formatting_structure\boxes.py:314(_remove_decoration)
  28102   0.069  17.947 ..\weasyprint\layout\preferred.py:179(inline_min_content_width)
  28101   0.293   0.661 ..\weasyprint\text.py:1176(can_break_text)
  28101   0.005   0.005 ..\weasyprint\formatting_structure\boxes.py:87(all_children)
  28101   0.044  46.791 ..\weasyprint\layout\inlines.py:29(iter_line_boxes)
  28101   0.601  46.747 ..\weasyprint\layout\inlines.py:62(get_next_linebox)
  28101   0.006   0.006 ..\weasyprint\layout\inlines.py:261(first_letter_to_box)
  28100   0.116   0.167 ..\weasyprint\layout\inlines.py:218(remove_last_whitespace)
  28100   0.005   0.005 ..\weasyprint\layout\inlines.py:1018(<listcomp>)
  28100   0.043   0.636 ..\weasyprint\layout\inlines.py:1003(line_box_verticality)
  28100   0.036   0.588 ..\weasyprint\layout\inlines.py:1059(aligned_subtree_verticality)
  28100   0.032   0.032 ..\weasyprint\layout\inlines.py:1149(text_align)
  21131   0.007   0.007 {method 'pop' of 'list' objects}
  20270   0.003   0.003 {method 'get' of 'dict' objects}
  19322   0.007   0.007 I:\x3rdParty\Python\lib\sre_parse.py:232(__next)

I dont know how to debug/step python. But what I read from the profile, is that iter_line_boxes is called a few times to often!

Let me know, if I can be of more assistance and thanks a lot in advance!

Johannes

@liZe
Copy link
Member

liZe commented Jul 30, 2018

This bug requires a specific layout situation with given widths and content. I hope it is reproducable on your side.

Unfortunately, it's not for me. Could you try to reproduce with a free font instead of Helvetica/Arial?

There's probably a problem with the columns, this feature is young and not widely used.

I dont know how to debug/step python. But what I read from the profile, is that iter_line_boxes is called a few times to often!

Using pdb may be useful, but it's hard to know where to put breakpoints when there's an endless loop. I'll try to explain how I would debug as soon as I can reproduce this error, I hope it'll help everyone to find a way to know what's going on…

@liZe
Copy link
Member

liZe commented Jul 30, 2018

Another possibility: this bug may be a duplicate of #614 as you're using Windows. If you use Pango < 1.40.13 then it's pretty sure.

@JohannesMunk
Copy link
Author

Hey liZe! Thanks for looking into this. You are right, I'am running windows. But I just updated my GTK3 to the newest runtime dist and have now Pango 1.42.1.0. Sadly with the same endless loop. The same happens under MacOS X with all the latest dists through homebrew.. So this seems to be something new. I will try to create a situation with another font!

@JohannesMunk
Copy link
Author

.. trying to figure out a font that would be available for you. Would "Verdana" work?

@liZe
Copy link
Member

liZe commented Jul 30, 2018

But I just updated my GTK3 to the newest runtime dist and have now Pango 1.42.1.0. Sadly with the same endless loop.

😢

trying to figure out a font that would be available for you. Would "Verdana" work?

Any font that's free and that I can easily download anywhere. Using Google Fonts' @import rules is also a solution.

@Tontyna
Copy link
Contributor

Tontyna commented Jul 30, 2018

Can reproduce the issue.
Seems to be another windowish font problem: No endless loop when using font-family: DejaVu Sans, sans-serif;

Will try to debug and catch...

@Tontyna
Copy link
Contributor

Tontyna commented Jul 30, 2018

OMG! Its weird!
The endless loop is triggered by the (seemingly ineffective completely useless HA!) doubled <span><span>. Reducing it to only one span the document renders fine.

Is this Cairo again? Akin to #628?

@Tontyna
Copy link
Contributor

Tontyna commented Jul 30, 2018

It's not the column-count and it's not the table. It's definitely the double-span, followed by the closing bracket (!) in combination with those windowish fonts and a special width of the containing box where the inline-splitting runs into an infinite loop -- constructed a simple div with the required width and the double-span and WHOOM!
Only difference: table and column-count get stuck in the layout of page 1, the simple div doesnt stop to produce pages.

Probably another discrepancy in the calculation of text-widths between ??? and ??? -- dunno yet, yes, looks like #614 and #585 , but that bug has been fixed...

@JohannesMunk
Copy link
Author

Hey Tontyna! Thanks for your digging! Cool, that you could reduce it further to the outside div.

Concerning the double spans: In the non reduced file the spans of course have different classes and attributes. Yes, I could programmatically combine them. But as the double spans work with other content in between.. it must be the combination of things, like you pointed out.

@Tontyna
Copy link
Contributor

Tontyna commented Jul 30, 2018

That's the output when I break the make_all_pages-loop:

endless

Interestingly the opening bracket isn't repeated.

@JohannesMunk
Copy link
Author

Binary search of first problem occurrence in document after switching to DejaVu:

<html lang="de">
 <head>
  <meta charset="utf-8" />
  <title>All Truckgroups</title>
  <style>
		body {
			margin: 1px;
			font-size: 9pt !important;
			font-family: DejaVu Sans, sans-serif;
		}
		.options {
			column-count: 2;
			column-gap: 3.5em;
			margin-left: 1cm;
			margin-right: 2cm;
		}
		table {
			width: 100%;
			border-spacing: 0;
			border-collapse: collapse; 
			font-size: 0.9em !important;
		}
		table, td {
			padding: 0;	
			margin: 0;
		}
		@media print {
			@page {
				size: A4;
				margin: 0;
			}
		}
	</style>
 </head>
 <body><div id="full">
   <div class="options">
    <div><span><span>Wenn der Schlauch angebracht ist, ist kein Herausheben der Batterie am NT-Mast
       und für Hubhöhen ≤ 2.600 mm an TL/TF-Masten möglich</span></span><span><span>nicht verfügbar
       bei NT Mast</span></span></div>
   </div>
  </div></body>
</html>

Hope this is now reproducable everywhere?

I downloaded and installed 2.37 of the ttf fonts from https://dejavu-fonts.github.io/Download.html

Good night and good luck!

And thank you!

@Tontyna
Copy link
Contributor

Tontyna commented Jul 31, 2018

Eliminating the second double-span still results in an endless loop when the immediately following word is too long to fit into the same line:

<div class="options">
  <div><span><span>Wenn der Schlauch angebracht ist, ist kein Herausheben der Batterie am NT-Mast
       und für Hubhöhen ≤ 2.600 mm an TL/TF-Masten möglich</span></span>xxx
  </div>
</div>

Separating the "xxx" from the double-span -- e.g. with a LF or a SPACE - prevents the infinite loop:

<div class="options">
  <div><span><span>Wenn der Schlauch angebracht ist, ist kein Herausheben der Batterie am NT-Mast
       und für Hubhöhen ≤ 2.600 mm an TL/TF-Masten möglich</span></span>
xxx
  </div>
</div>

Definitely a mutation of #614

@JohannesMunk
Copy link
Author

@Tontyna : cool! I really like your clear cut approach in #614.

Am I correct in saying, that by HTML standards this additional LF or SPACE should not matter?

I will try to introduce some in my output and see if my files work.

Thanks for your support!

@Tontyna
Copy link
Contributor

Tontyna commented Jul 31, 2018

@JohannesMunk : Thx.

But this issue must be caught one level higher. I almost understand where it happens but am not (yet) able to fix it.

Thats the situation:

 <LineBox div>
   <InlineBox span>
     <InlineBox span>
       <TextBox span> 
         Text that should be spread over 2 lines. Having a bit of 
         space at the end. But not enough for the following xxx
   <TextBox div>xxx 

Since there is no SPACE or LF after the <span> Weasyprint tries to put the "xxx" on the second line, detects that there isnt enough room.
Now, I think, it SHOULD skip-stack-back to the start of the already correctly broken second line.
But instead, it skip-stacks back to the start of the InlineBox.

Watching resume_at/skip_stack in get_next_linebox (calculated by split_inline_box) reveals it's pending endlessly:

(0, (0, (0, (30, None))))
(0, (0, (0, None)))
(0, (0, (0, (30, None))))
(0, (0, (0, None)))
(0, (0, (0, (30, None))))
(0, (0, (0, None)))
....

The above "30" is (as far as I understand) the first letter of the (successfully split) second text line, but next call to split_inline_box jumps back again (not shure, but I think it's the start of the <InlineBox span>).

@liZe
Copy link
Member

liZe commented Jul 31, 2018

Minimal case:

<div style="font-family: Ahem; width: 3.5em">
<span><span>xxx x x</span></span><span>x

@liZe
Copy link
Member

liZe commented Jul 31, 2018

Good news: I've been saved by a comment 😉. Fix coming soon.

@liZe liZe added the bug Existing features not working as expected label Jul 31, 2018
@liZe liZe added this to the 43 milestone Jul 31, 2018
@liZe liZe closed this as completed in 81b3ee6 Jul 31, 2018
@liZe
Copy link
Member

liZe commented Jul 31, 2018

@JohannesMunk @Tontyna Thanks a lot for your bug report, examples and hard work.

The commit message and diff should be self-explanatory. This function is really tricky and quite recent (as it was modified to fix #163), that's why a lot of comments had been added. I would have spent days to fix this without this comment, that's another reason to add more (but not too many) as discussed in #659.

@Tontyna
Copy link
Contributor

Tontyna commented Jul 31, 2018

Oh yes, remember having seen those grandchildren before. I didnt like them at all 😉

@JohannesMunk
Copy link
Author

Hey you two!

I just successfully generated 6 x 125 pages of PDFs!! Great stuff! Thanks a lot for your super nice and quick responses and fixes. Everything working now! I am impressed by the thorough regression tests!

If you are interested I have another open issue concerning multiple columns and horizontal sizing of a table inside. I probably will be able to work around it, or shall I extract the problem and submit another issue for it?

All the best and thank you again!

Johannes

@liZe
Copy link
Member

liZe commented Aug 1, 2018

I just successfully generated 6 x 125 pages of PDFs!! Great stuff! Thanks a lot for your super nice and quick responses and fixes. Everything working now! I am impressed by the thorough regression tests!

😃

If you are interested I have another open issue concerning multiple columns and horizontal sizing of a table inside. I probably will be able to work around it, or shall I extract the problem and submit another issue for it?

It looks like an awful problem, but I'd be happy to have a separate issue for that.

All the best and thank you again!

No problem! We currently have a tiny opinion survey open in #635, would you be interested in writing a little message? I'm curious about your 125-page documents 😉.

@Tontyna
Copy link
Contributor

Tontyna commented Aug 1, 2018

Ah, the survey - isn't there a way to promote it? It's already buried in the open issues. Maybe via an issue template?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Existing features not working as expected
Projects
None yet
Development

No branches or pull requests

3 participants