Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break non-breaking hyphen #462

Closed
aka-demik opened this issue May 27, 2016 · 16 comments
Closed

Break non-breaking hyphen #462

aka-demik opened this issue May 27, 2016 · 16 comments
Assignees

Comments

@aka-demik
Copy link

aka-demik commented May 27, 2016

My test code:

Aaaa bbbbbb ccccccccc ddddd eeeeeeeee ffffffffffffff gggggggggggg hhhhhhh iiiiiii RFC{nbsp}P{nbsp}IEC{nbsp}60870‑5‑104

Render:
1

> asciidoctor --version
Asciidoctor 1.5.4 [http://asciidoctor.org]
Runtime Environment (ruby 2.2.3p173 (2015-08-18 revision 51636) [i386-mingw32]) (lc:IBM866 fs:Windows-1251 in:- ex:IBM866)

> asciidoctor-pdf --version
Asciidoctor PDF 1.5.0.alpha.11 using Asciidoctor 1.5.4 [http://asciidoctor.org]
Runtime Environment (ruby 2.2.3p173 (2015-08-18 revision 51636) [i386-mingw32]) (lc:IBM866 fs:Windows-1251 in:- ex:IBM866)
@meisterluk
Copy link
Contributor

Interestingly, I cannot confirm this on xubuntu 16.04.

non-breaking-hyphen

meisterluk@sensei ~ % asciidoctor --version
Asciidoctor 1.5.4 [http://asciidoctor.org]
Runtime Environment (ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:- ex:UTF-8)
meisterluk@sensei ~ % asciidoctor-pdf --version
Asciidoctor 1.5.4 [http://asciidoctor.org]
Runtime Environment (ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:- ex:UTF-8)
meisterluk@sensei ~ % uname -a
Linux sensei 4.4.0-36-generic #55-Ubuntu SMP Thu Aug 11 18:01:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
meisterluk@sensei ~ % lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:    16.04
Codename:   xenial

This is only informative. I didn't interpret it any further.

@mojavelinux
Copy link
Member

This happens because the non-breaking hyphen is not in the Noto Serif font shipped with Asciidoctor PDF. The reason that matters actually has to do with an internal quirk in Prawn. When Prawn can't find the character in the specified font, and a fallback font is in use, it splits off that fragment from the rest of the string to mark it as having a different font.

[{:text=>"Aaaa bbbbbb ccccccccc ddddd eeeeeeeee ffffffffffffff gggggggggggg hhhhhhh iiiiiii RFC P IEC 60870", :color=>"333333"}, {:text=>"‑", :color=>"333333", :font=>"M+ 1p Fallback"}, {:text=>"5", :color=>"333333"}, {:text=>"‑", :color=>"333333", :font=>"M+ 1p Fallback"}, {:text=>"104", :color=>"333333"}]

The line wrap permits a break between any two fragments in this array. Hence, it's allowing a break not because it allows a non-breaking hyphens to be a break opportunity, but because the non-breaking hyphen is missing from the font (and gets split off from the rest of the string).

I'll go ahead and add the non-breaking hyphen to the Noto Serif font...but this problem is going to come up anytime the main font is missing the character in use.

Btw, as a workaround, you can use an en-dash instead. Here's how that same string gets represented when the en-dash is used (which is in the Noto Serif font):

[{:text=>"Aaaa bbbbbb ccccccccc ddddd eeeeeeeee ffffffffffffff gggggggggggg hhhhhhh iiiiiii RFC P IEC 60870–5–104", :color=>"333333"}]

We should file an issue upstream that Prawn should not break between fragments if the fragment is isolated just to change the font. But I'll admit, this gets complicated very quickly.

@mojavelinux mojavelinux added this to the v1.5.0.alpha.13 milestone Sep 11, 2016
@mojavelinux mojavelinux self-assigned this Sep 11, 2016
@mojavelinux
Copy link
Member

While there are many scenarios this fix will not address, what we can focus on is the case caused by Asciidoctor PDF, which is the fact that this character is missing from the built-in Noto Serif font.

@mojavelinux
Copy link
Member

mojavelinux commented Sep 11, 2016

@meisterluk It looks like you're using an older version of Asciidoctor PDF which did not use a fallback font. That emphasizes that the other solution to this problem is to create a theme that does not use the fallback font. You'll notice in your case that you got the right behavior, but the glyph was still missing. That reveals that the character is missing from the main font and that a fallback font is not being used.

In other words, a side-effect of using the fallback font is that it introduced break opportunities where there shouldn't be break opportunities.

mojavelinux added a commit to mojavelinux/asciidoctor-pdf that referenced this issue Sep 11, 2016
- non-breaking hyphen must be included in font or else arranger
  inadvertently introduces line break opportunity when partitioning
  glyph into separate fragment to apply fallback font
@meisterluk
Copy link
Contributor

@mojavelinux I can confirm what you said and my version information provided above indicates so. Thanks!

@mojavelinux
Copy link
Member

👍

@mojavelinux
Copy link
Member

I'm still curious why Prawn doesn't keep characters together across fragments. I tried to understand the logic in Prawn but I couldn't figure out how it works in the time I had.

https://github.com/prawnpdf/prawn/blob/2.1.0/lib/prawn/text/formatted/line_wrap.rb#L40-L57

@meisterluk
Copy link
Contributor

Based on a discussion I just had, I would like to point out that prawn falls back to Win1252 for PDF builtin fonts. To the best of my knowledge, Win1252 does not have a non-breaking hyphen. Here no builtin font is used (but Noto provided as TTF file). So I'm not sure this contributes anything to this discussion, but needs to be considered when using other fonts 😏

@mojavelinux
Copy link
Member

I would like to point out that prawn falls back to Win1252 for PDF builtin fonts

That's only true if you are not using a custom TTF font. If you are using a custom TTF font, the behavior is totally different. If you specify a fallback font, Prawn will use that. If you aren't, it will silently not put a character there.

@mojavelinux
Copy link
Member

(To be honest, the way Prawn handles fonts...leaves a lot to be desired...but it's gotten us this far).

@meisterluk
Copy link
Contributor

@mojavelinux Sorry, I meant "custom TTF font", not "PDF builtin font". Then our statements correspond 👍

Hm, now I use

meisterluk@sensei ~ % asciidoctor --version
Asciidoctor 1.5.4 [http://asciidoctor.org]
Runtime Environment (ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:- ex:UTF-8)
meisterluk@sensei ~ % asciidoctor-pdf --version
Asciidoctor PDF 1.5.0.alpha.12 using Asciidoctor 1.5.4 [http://asciidoctor.org]
Runtime Environment (ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:- ex:UTF-8)

but still get the same visual result, I posted above. I replaced notoserif-regular-subset.ttf with the font provided in commit ccbdebe (not sure this should be tested like that?!). Same result.

If I understand you correctly, this is not expected result from your explanation above, right?

@mojavelinux
Copy link
Member

Then our statements correspond

Almost, though it makes me realize I should be even more clear about this in the theming guide.

If Prawn can't find a character in the TTF font (either the primary or the fallback font), it will just put the character information in the document without the outline information (aka the strokes). You can copy that blank character into another document and see that it's really there.

When using a built-in (AFM) font, if Prawn can't find a character, then our missing character handler is invoked, which adds a logical not character to the document. See https://github.com/asciidoctor/asciidoctor-pdf/blob/949917ab20076a621e9c6d80e976539385ec45c4/lib/asciidoctor-pdf/prawn_ext/font/afm.rb.

That's only used if you're not using a TTF font.

Again, I don't entirely agree that Prawn is handling font characters correctly. I'd like to receive that callback even when using TTF fonts so we can place a fallback character into the document.

mojavelinux added a commit to mojavelinux/asciidoctor-pdf that referenced this issue Sep 11, 2016
- non-breaking hyphen must be included in font or else arranger
  inadvertently introduces line break opportunity when partitioning
  glyph into separate fragment to apply fallback font
@mojavelinux
Copy link
Member

Same result.

I'm not convinced that you have all the right files in place. Can you be more specific how you are invoking Asciidoctor PDF? For testing, I strongly recommend using the gem directly...preferably with Bundler.

Gemfile

source 'https://rubygems.org'
gem 'asciidoctor-pdf', github: 'mojavelinux/asciidoctor-pdf', branch: 'issue-462'

Then run:

$ rm -f Gemfile.lock
  bundle config --local github.https true
  bundle --path=.bundle/gems --binstubs=.bundle/.bin

Then you can run Asciidoctor PDF either using:

$ bundle exec asciidoctor-pdf input.adoc

or

$ ./.bundle/.bin/asciidoctor-pdf input.adoc

That way, you know which version you are using.

@meisterluk
Copy link
Contributor

@mojavelinux Thanks for the command lines! Simplifies a few things. Apparently I screwed up a few things in my mind. Spent too many hours with asciidoctor today 😵

I executed the lines you posted, additionally with

echo "Aaaa bbbbbb ccccccccc ddddd eeeeeeeee ffffffffffffff gggggggggggg hhhhhhh iiiiiii RFC{nbsp}P{nbsp}IEC{nbsp}60870‑5‑104" > input.adoc

Then I get …

nbhyphen

… which is the desired behavior. So everything should be fine and we found a temporary solution to this issue, right?

@mojavelinux
Copy link
Member

\o/ :beers:

This is a permanent solution, but very specific to this use case. If something causes the hyphen to be isolated in a fragment by itself (for instance, if you try to bold it), then the line will break there. The general solution is to figure out why Prawn is allowing breaks at the boundaries of fragments, since fragments are merely an internal data structure that should have no bearing on line wrapping behavior. In other words, it's a bug in Prawn.

@mojavelinux
Copy link
Member

if you try to bold it

meaning if you try to put just the non-breaking hyphen in bold.

mojavelinux added a commit to mojavelinux/asciidoctor-pdf that referenced this issue Sep 15, 2016
- The non-breaking hyphen glyph must be included in font or else Prawn's
  text arranger inadvertently introduces a line break opportunity when
  it partitions character into a separate fragment in order to apply the
  fallback font (only happens when a fallback font is used)
fapdash pushed a commit to vogellacompany/asciidoctor-pdf that referenced this issue Dec 13, 2016
…asciidoctor#550)

- The non-breaking hyphen glyph must be included in font or else Prawn's
  text arranger inadvertently introduces a line break opportunity when
  it partitions character into a separate fragment in order to apply the
  fallback font (only happens when a fallback font is used)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants