Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong Position of Accents for Sequences of DIN 91379 #777

Open
vk-github18 opened this issue Oct 8, 2021 · 9 comments
Open

Wrong Position of Accents for Sequences of DIN 91379 #777

vk-github18 opened this issue Oct 8, 2021 · 9 comments

Comments

@vk-github18
Copy link

Wrong position of accents for sequences defined in DIN 91379

Describe the bug

The position of the accents is incorrect for most of the character sequences
defined in the following specification:

DIN SPEC 91379: Characters in Unicode for the electronic processing of names
and data exchange in Europe; with digital attachment
https://www.xoev.de/downloads-2316#StringLatin
https://www.din.de/de/wdc-beuth:din21:301228458

E.g. with 0041 030B LATIN CAPITAL LETTER A WITH COMBINING DOUBLE ACUTE ACCENT
the accent appears at the right hand side of the letter A, not above the
letter A.

To Reproduce

Render Din91379-Letters.html and Din91379-List.html with OPEN HTML TO PDF.

Expected behavior

The correct rendering should look like the output of HarfBuzz hb-view 2.9.1
for Din91379-Sequences.txt, see Din91379-Sequences.png.
HarfBuzz uses the info in the OpenType GPOS table for the positioning of
combining diacritical marks.

hb-view.exe -o Din91379-Sequences.png NotoSans-Regular.ttf < Din91379-Sequences.txt
See https://github.com/harfbuzz/harfbuzz.

Screenshots

Rendering with OPEN HTML TO PDF

image

Rendering with HarfBuzz

Din91379-Sequences

System (please complete the following information):

OS: Windows 10
Used Font: NotoSans, NotoSansMath,
see https://github.com/googlefonts/noto-fonts/tree/main/hinted/ttf/NotoSans,
https://github.com/googlefonts/noto-fonts/tree/main/hinted/ttf/NotoSansMath

Additional context

See also
https://issues.apache.org/jira/browse/PDFBOX-4951
LibrePDF/OpenPDF#442
https://issues.apache.org/jira/browse/FOP-2969
googlefonts/noto-fonts#1882

Files

Letters of DIN91379

din91379_letters_all.txt
din91379_list_all.txt
Din91379-Sequences.txt

HTML-Files

Din91379-Letters.html
Din91379-List.html

PDF-files rendered with OPEN HTML TO PDF

Din91379-Letters.html.pdf
Din91379-List.html.pdf

Java program to reproduce the bug

Test1.java

@syjer
Copy link
Contributor

syjer commented Oct 8, 2021

I would guess it's the same issue as #763

@vk-github18
Copy link
Author

Yes, both issues suffer from the lack of a text shaping engine like HarfBuzz.
It should be possible to implement the change I proposed in
https://issues.apache.org/jira/browse/PDFBOX-4951, Comment 28. Nov 2020
at the interface from OPEN HTML TO PDF to PDFBox -- no change of PDFBox required.

@danfickle
Copy link
Owner

I've started work on modernizing the advance shaping PR for pdfbox at danfickle/pdfbox.

The files are under:

It is very early stage but as a proof-of-concept this is what I'm producing:
image

@vk-github18
Copy link
Author

@danfickle Glad to here that you are working at the support of advanced glyph layout. You chose the hard way, to implement all the needed functionality, while I proposed to use the glyph layout provided by the Java platform.

I tried to layout the sequences of DIN91379 with the code in AdvancedTextLayout example but failed, because the font NotoSans-Regular could not be loaded. This font has IMHO at the moment the best support of DIN91379 under the freely available fonts.

The error occurs at calling
OpenTypeFont otFont = fontParser.parse(fontFile);
for Noto Sans Regular:

java.lang.UnsupportedOperationException: coverage set class table not yet supported
at org.apache.fontbox.ttf.advanced.GlyphClassTable$CoverageSetClassTable.(GlyphClassTable.java:262)
at org.apache.fontbox.ttf.advanced.GlyphClassTable.createClassTable(GlyphClassTable.java:95)
at org.apache.fontbox.ttf.advanced.AdvancedTypographicTableReader.readGDEFMarkGlyphsTableFormat1(AdvancedTypographicTableReader.java:3371)
at org.apache.fontbox.ttf.advanced.AdvancedTypographicTableReader.readGDEFMarkGlyphsTable(AdvancedTypographicTableReader.java:3384)
at org.apache.fontbox.ttf.advanced.AdvancedTypographicTableReader.readGDEF(AdvancedTypographicTableReader.java:3447)
at org.apache.fontbox.ttf.advanced.AdvancedTypographicTableReader.read(AdvancedTypographicTableReader.java:136)
at org.apache.fontbox.ttf.advanced.GlyphDefinitionTable.read(GlyphDefinitionTable.java:105)
at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:399)
at org.apache.fontbox.ttf.TrueTypeFont.getTable(TrueTypeFont.java:183)
at org.apache.fontbox.ttf.OpenTypeFont.getGDEF(OpenTypeFont.java:123)
at org.apache.fontbox.ttf.advanced.AdvancedTypographicTableReader.initializeGPOS(AdvancedTypographicTableReader.java:3551)
at org.apache.fontbox.ttf.advanced.AdvancedTypographicTableReader.readGPOS(AdvancedTypographicTableReader.java:3501)
at org.apache.fontbox.ttf.advanced.AdvancedTypographicTableReader.read(AdvancedTypographicTableReader.java:140)
at org.apache.fontbox.ttf.advanced.GlyphPositioningTable.read(GlyphPositioningTable.java:106)
at org.apache.fontbox.ttf.TrueTypeFont.readTable(TrueTypeFont.java:399)
at org.apache.fontbox.ttf.TTFParser.parseTables(TTFParser.java:187)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:164)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:91)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:101)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:79)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:27)
at org.apache.fontbox.ttf.TTFParser.parse(TTFParser.java:86)
at org.apache.fontbox.ttf.OTFParser.parse(OTFParser.java:73)
at org.apache.pdfbox.examples.pdmodel.AdvancedTextLayoutSequencesDin91379.testAdvancedLayout(AdvancedTextLayoutSequencesDin91379.java:199)
at org.apache.pdfbox.examples.pdmodel.AdvancedTextLayoutSequencesDin91379.main(AdvancedTextLayoutSequencesDin91379.java:72)

@vk-github18
Copy link
Author

Trying to load DejaVuSans with OTFParser results in:
java.lang.UnsupportedOperationException: TTF fonts do not have a CFF table
at org.apache.fontbox.ttf.OpenTypeFont.getCFF(OpenTypeFont.java:73)
at org.apache.fontbox.ttf.OpenTypeFont.getPath(OpenTypeFont.java:92)
at org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.createFontDescriptor(TrueTypeEmbedder.java:251)
at org.apache.pdfbox.pdmodel.font.TrueTypeEmbedder.(TrueTypeEmbedder.java:75)
at org.apache.pdfbox.pdmodel.font.PDCIDFontType2Embedder.(PDCIDFontType2Embedder.java:76)
at org.apache.pdfbox.pdmodel.font.PDType0Font.(PDType0Font.java:116)
at org.apache.pdfbox.pdmodel.font.PDType0Font.load(PDType0Font.java:192)
at org.apache.pdfbox.examples.pdmodel.AdvancedTextLayoutSequencesDin91379.testAdvancedLayout(AdvancedTextLayoutSequencesDin91379.java:207)
at org.apache.pdfbox.examples.pdmodel.AdvancedTextLayoutSequencesDin91379.main(AdvancedTextLayoutSequencesDin91379.java:75)

@vk-github18
Copy link
Author

I added a little test to https://github.com/vk-github18/pdfbox

examples/src/main/java/org/apache/pdfbox/examples/pdmodel/AdvancedTextLayoutSequencesDin91379.java

to compare the computing of the layout vector and the rendering of glyphs with Java2D and AdvancedTextLayout for some fonts.
The layout vector is nearly identical (taking a factor of 50 into account). The rendering is surprisingly different.

@vk-github18
Copy link
Author

The error "java.lang.UnsupportedOperationException: coverage set class table not yet supported"
is solved by applying the following FOP patch:
apache/xmlgraphics-fop@551007e

@vk-github18
Copy link
Author

vk-github18 commented May 27, 2022

I did some prototyping based on your branch of PDFBox, see https://github.com/vk-github18/pdfbox.
The resulting positioning looks good. Only when one base letter has two combining diacritics, the second one is positioned
wrong. In this case the positioning information in the layout vector ist wrong. I will clean this up and prepare a pull request for
danfickle/pdfbox in the next days.

image

@vk-github18
Copy link
Author

vk-github18 commented Jun 3, 2022

You find the pull request in danfickle/pdfbox#2
I also started a pull request for PDFBox, see apache/pdfbox#143
and a discussion in https://issues.apache.org/jira/projects/PDFBOX/issues/PDFBOX-4951?filter=allopenissues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants