Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification of current PDF viewer support for AccSupp ActualText #2

Open
123geek opened this issue Sep 24, 2020 · 16 comments
Open

Clarification of current PDF viewer support for AccSupp ActualText #2

123geek opened this issue Sep 24, 2020 · 16 comments

Comments

@123geek
Copy link

123geek commented Sep 24, 2020

Hi, we have just benchmarked all popular PDF viewers we could come to think of for AccSupp support, and this is the outcome:

AccSupp copy-paste support per different PDF viewers and 24 September 2020

Full or limited support:

  • Evince (2.32.0.145 on Windows and 3.38.0 on macOS), tested with up to 112,000 characters and it worked.

Broken support:

  • Adobe Acrobat Reader DC (2020.012.20043 on Windows): Empty lines and indentation are produced correctly. Would not process 100,000 characters long AccSup, here complained the PDF was broken and told the user to contact the author. Would open a PDF with a 8063 characters long AccSupp, but it only provides a random selection of 4248 of them, i.e. Adobe Acrobat breaks the output sometimes.

  • Internet Explorer (on Windows): Appears to use Adobe Acrobat internally, same broken behavior as Acrobat.

  • Chrome (85.0.4183.21 on Windows and 85.0.4183.121 on macOS): Supported but truncates to first 16383 characters, and will also remove newlines (this means removal also of empty lines) and replace double spaces with single spaces

No support:

  • FireFox (80.0.1 64bit on Windows)

  • MuPdf (1.17.0 on Windows)

  • PDF Exchange Viewer (2.5 on Windows)

  • SumatraPDF (3.2 64bit on Windows)


Reproduction script:
Replace HEX with the 8K or >100K characters hex-UTF16-encoded.

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage{verbatim}
\usepackage{listings}
\usepackage[space=true]{accsupp}
\begin{document} 
\BeginAccSupp{method=hex,unicode,ActualText=HEX}
\begin{verbatim}
copyme
\end{verbatim}
\EndAccSupp{}
\end{document}
@josephwright
Copy link

There are documented limits on the length of various PDF constructs: see the PDF Reference Manual. It would not surprise me if Acrobat wasn't reflecting that.

@123geek
Copy link
Author

123geek commented Sep 24, 2020

@josephwright interesting, thank you for pointing out. For completeness would you mind sharing the URL and page/section where this is written?

@josephwright
Copy link

https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf, toward the end: 'Implementation Limits'.

@123geek
Copy link
Author

123geek commented Sep 24, 2020

@josephwright the AccSupp ActualText limit is which part of which page - is it "string (in content stream)", which has a limit of 32,767 bytes?

That would be max 16,383 UTF16 characters, and would coincide with what Chrome renders.

(Because PDF copy-paste is generally so unreliable, I guess trying to line up multiple AccSupps in order to get multiples of 32,767 bytes would be a generally bad idea. That would be a question for discussion in a separate thread though e.g. https://tex.stackexchange.com/questions/563803/how-make-a-latex-document-that-generates-a-pdf-from-which-copy-paste-works-corre .)

@123geek
Copy link
Author

123geek commented Sep 25, 2020

After further testing we realized that Chrome always produces an AccSupp as one single line, meaning its AccSupp support is broken. Updated the topmost post in this thread accordingly. This means to date only Evince has AccSupp support that works.

@josephwright @davidcarlisle , do you have any expectation that other PDF viewers will start delivering AccSupp ActualText correctly in the future?

@u-fischer
Copy link
Contributor

I have no idea what PDF viewers will do here in the future. But why do you claim that they don't handle it correctly already? Nowhere in the specification /ActualText is described as a mean to store long code listings with thousands of characters. Actually the specification says "This replacement text (which should apply to as small a piece of content as possible) ...".

@123geek
Copy link
Author

123geek commented Sep 26, 2020

@u-fischer thanks for your response. To understand this better, would you mind sharing your view of this?:

On our part the goal is copy-paste that works, just like it does in HTML/web browsers - that is leading spaces and empty lines are preserved (just like double spaces and all other characters within lines).

Would you say that PDF's copy-paste behavior is so ambiguous that I guess that if you would put AccSupp:s in a sequence, you have absolutely no idea what will come out it?

If it could be relied on then your PDF specification quote of "small piece", could be satisfied. Of course what's a "small piece", I think 10 A4:s worth of characters is a small piece, though I take it that someone else could suggest that "small" means <100. To satisfy that we could make one AccSupp per 100 characters.

Two days from now we will submit bug reports to all popular PDF viewers with incorrect AccSupp handling, that they should copy-paste AccSupp ActualText correctly.

@123geek
Copy link
Author

123geek commented Sep 26, 2020

Only for completeness, in https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf :

The "small a piece" quote you made is from the definition of ActualText on page 559, it reads:

(Optional; PDF 1.4) Text that is an exact replacement for the structure element and its children. This replacement text (which should apply to as small a piece of content as possible) is useful when extracting the document’s contents in support of accessibility to users with disabilities or for other purposes (see 14.9.4, “Replacement Text”).

Section "14.9.4 Replacement Text" is on page 615 and reads:

Just as alternate descriptions can be provided for images and other items that do not translate naturally into text (as described in the preceding sub-clause), replacement text can be specified for content that does translate into text but that is represented in a nonstandard way. These nonstandard representations might include, for example, glyphs for ligatures or custom characters, or inline graphics corresponding to letters in an illuminated manuscript or to dropped capitals.

Replacement text may be specified for the following items:

• A structure element (see 14.7.2, “Structure Hierarchy”), by means of the optional ActualText entry (PDF 1.4) of the structure element dictionary.
• (PDF 1.5) A marked-content sequence (see 14.6, “Marked Content”), through an ActualText entry in a property list attached to the marked-content sequence with a Span tag.

The ActualText value shall be used as a replacement, not a description, for the content, providing text that is equivalent to what a person would see when viewing the content. The value of ActualText shall be considered to be a character substitution for the structure element or marked-content sequence. If each of two (or more) consecutive structure or marked-content sequences has an ActualText entry, they shall be treated as if no word break is present between them.

The last line, that a sequence of AccSupp:s should "be treated as if no word break is present between them", is interesting. For our copy-paste usecase, we should make a Tex test where we make one AccSupp per character and see if the viewer honors it. It should?

@davidcarlisle
Copy link
Member

one AccSupp per character

That is I think the more normal use, in fact you only really need it for characters that don't have a standard unicode encoding in the font, which here is possibly just the white space, the other characters should copy naturally without this.

@u-fischer
Copy link
Contributor

one AccSupp per character

well in a pdf are glyphs not characters. And this distinction is important here: ActualText allows to add to a glyph (or a picture, or small cluster of glyphs) "characters". As described here https://tex.stackexchange.com/a/564164/2388 this is important for languages where you can't simply assign to every glyph one or more ToUnicodes to represent the input, for example when the font shaping changes the order. Such scripts uses ActualText to add the correct characters in such cases (and so enable copy&paste), and they uses for small pieces, in the devanagari example around two glyphs:

/Span<</ActualText<FEFF092A093F>>>BDC
BT
/F33 9.96264 Tf
1 0 0 1 333.2 707.125 Tm [<025F002E>]TJ
ET
EMC

Marked Content operators with ActualText inside the page stream can not be nested. If you start to add it around long pieces of texts, you would break the copy&pasting of these scripts. This may not matter for your use case, but it means that it can't be a general solution to improve copy&pasting of code. (ActualText can also be added to structure elements, but it is rather unclear how well pdf viewer support this, and how they handle nesting in such cases).

Besiede this: As a user I wouldn't want to have to extract long code listings by copy&paste from a pdf, I would greatly prefer an attachment or an embedded file, that I can simply save or open in an editor, as I then can be more confident that I got the real code, and not something that got modified by the pdf viewer or the OS.)

@123geek
Copy link
Author

123geek commented Sep 26, 2020

Here follows a series of tests of PDF copy-paste behavior:

Test 1
Here is the outcome of a small test of multiple AccSupp:s:

This text:

Hlo
  w
r

ld{

corresponds to this UTF16-LE encoding: 0048006c006f000a002000200077000a0072000a000a006c0064007b

Here follows a Tex file that delivers this string as single-character AccSupps. Neither AccSupp has any visual representation.

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage[space=true]{accsupp}
\begin{document} 
before
\BeginAccSupp{method=hex,unicode,ActualText=0048}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006f}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=000a}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0020}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0020}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0077}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=000a}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0072}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=000a}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=000a}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0064}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=007b}\EndAccSupp{}
after
\end{document}
  • When copying this out in Adobe Reader DC, I get:
before Hlo
w
r
ld{after
1

That is, correct except verbatim copy-paste is broken as leading spaces and empty lines have been stripped.

  • When copying this out in Evince, I get this paste:
before H after
{
d
r
w


o
l
1

The order is broken and the second "l" is gone.

  • Chrome gives this:
before after
1

I.e. it conveninently ignores the AccSupps altogether.

So, we see that current PDF viewers have a very peculiar way of handling sequences of AccSupp ActualTexts. Specifically, ann AccSupp ActualText which only contains a space or a newline, will (tend to) be ignored.

This gives weight to the idea of locating a whole copy-paste inside one single AccSupp. Let's try some middle line approach:


Test 2
Let's make a sequence of AccSupp:s that are as short as possible, WHILE never breaking at newline or space. I'll use this text:

Hl
  ow
rl

d{

And only break between "H" and "l", between "o" and "w", between "r" and "l", and between "d" and "{". That is Tex:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage[space=true]{accsupp}
\begin{document} 
before
\BeginAccSupp{method=hex,unicode,ActualText=0048}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a0020006f}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0077000a0072}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a000a0064}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=007b}\EndAccSupp{}
after
\end{document}
  • Incredibly, Adobe Reader renders this piece correctly - its paste is:
before Hl
 ow
rl

d{after
1

just wow!

  • Evince meanwhile disappoints, it copies out in a similar half reverse order as in the previous test:
before H
{after
l

d
w
r
l
 o
1
  • And Chrome disappoints in the same way, it ignores the AccSupps altogether:
before after
1

I'll now make a set of additional tests to look for unexpected behavior relating to AccSupps.


Test 3
For completeness, please note that AccSupp is fragile - here is a repetition of the previous section but with one character (A-E) added in each AccSupp:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage[space=true]{accsupp}
\begin{document} 
before
\BeginAccSupp{method=hex,unicode,ActualText=0048}A\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a0020006f}B\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0077000a0072}C\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a000a0064}D\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=007b}E\EndAccSupp{}
after
\end{document}
  • Adobe Reader DC copies this as
before H l
 o w
r l

d { after
1

i.e. it now suddenly starts kicking in a space character between each AccSupp - this is undesired though I guess can't be viewed as totally unreasonable.

  • Evince curiously has correct behavior:
before Hl
 ow
rl

d{ after
1
  • Chrome:
before Al ow rDE after
1

that is total chaos, it sometimes picks the letter (A, D, E) and sometimes the ActualText but when it does so it strips double spaces and newlines.


Test 4
Having the AccSupps contain the actual text give even worse outcome:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage[space=true]{accsupp}
\begin{document} 
before
\BeginAccSupp{method=hex,unicode,ActualText=0048}H\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a0020006f}l\\  o\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0077000a0072}w\\r\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a000a0064}l\\\\d\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=007b}\{\EndAccSupp{}
after
\end{document}
  • Adobe Reader DC:
before H l
 o
w
r
l

d
{ after
1

That is the whitespaces between AccSupps we saw when adding in characters A-E have now become newlines, that is even worse.

  • Evince:
before H
l
 o
w
r
l

d{ after
1

that is same as Adobe Reader DC, it now adds newlines between AccSupps.

  • Chrome:
before Hl o
w r
l d
{ after
1

gibberish, it produces the visual characters and then treats each AccSupp as either nothing or a newline.

So that is, our testing to this point shows that only AccSupp ActualText with no visual content, give correct verbatim copy-paste behavior.


Test 5
I now integrate the previous observations as follows:

First, a separate set of AccSupps that have the intended text as ActualText and no visual text in them. This will be one AccSupp per character, except for spaces and newlines which will be incorporated in AccSupps in such a way that there is never a space or newline at the beginning or end of an AccSupp.

Then subsequently separately, I'll visually show the ActualText, in such a way that each visual character is contained in an AccSupp with empty ActualText, this way the visual part should never be considered for copying by the PDF viewer, hence avoiding a "double copy" of the text in the copy. Tex:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage[space=true]{accsupp}
\begin{document} 
before
\BeginAccSupp{method=hex,unicode,ActualText=0048}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a0020006f}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0077000a0072}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a000a0064}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=007b}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}H\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}l\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\\\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=} \EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=} \EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}o\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}w\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\\\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}r\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}l\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\\\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\\\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}d\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\{\EndAccSupp{}
after
\end{document}
  • Adobe Reader DC:
before Hl
 ow
rl

d{
after
1

That is beautifully correct.

  • Evince:
before H
{
l

d
w
r
l
 o
after
1

Same partial reverse order bug as above.

  • Chrome:
before Hl
ow
rl
d{ after
1

It produces an output in line with ignorance of AccSupps.

In essence this outcome was greatly encouraging. Its only shortcoming is that the selection for copying the whole section, is made exactly at the beginning of the text block, and selecting text within the text block will copy nothing. This is not how text selection normally works, and would be felt as unintuitive by people.


Test 6
Here is what could have been an optimal strategy for verbatim copy-paste in PDF, based on these observations:

The intended text is organized into what we call a sequence of clusters. Each individual character is a cluster, except for in the presence of spaces and newlines, which are incorporated into groups in such a way that there is never a space or newline at the beginning or end of a cluster.

We will then produce the Tex as follows, for each cluster:

\BeginAccSupp{method=hex,unicode,ActualText=[HEX OF CLUSTER HERE]}\EndAccSupp{}

Followed by, for each character contained in the cluster,

\BeginAccSupp{method=hex,unicode,ActualText=}[CHARACTER HERE]\EndAccSupp{}

If this works, it integrates successful verbatim copy-paste, with the convenience of being able to make the text selection for copying, in the approximate visual location in the document where the text is actually displayed.

Also note that this satisfies AccSupp's requirement to never be made across a page boundary.

That means for the same text as above, this Tex:

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{textcomp}
\usepackage[space=true]{accsupp}
\begin{document} 
before
\BeginAccSupp{method=hex,unicode,ActualText=0048}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}H\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a0020006f}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}l\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\\\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=} \EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=} \EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}o\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=0077000a0072}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}w\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\\\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}r\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=006c000a000a0064}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}l\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\\\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\\\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}d\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=007b}\EndAccSupp{}\BeginAccSupp{method=hex,unicode,ActualText=}\{\EndAccSupp{}
after
\end{document}
  • Adobe Reader DC produces:
before H l
 o
w
r
l

d
{ after
1

I.e. unfortunately not correct, the presence of visual elements is reflected here as spaces (between H and l, and between d and {) and newlines (between o and w and between r and l).

  • Evince:
before H l
 o
w
r
l

d
{ after
1

Curiously the same behavior.

  • Chrome:
before Hl
ow
rl
d{ after
1

..so that's a total fail - visuals with AccSupp ActualText="" between the AccSupp ActualTexts, mess up the copy-paste behavior.


Conclusions and discussion
From these tests it appears that current generation PDF viewers only realistically accommodate the approach of test 5, that is, copy-paste is supported but the whole text block's copy is located to the visual beginning of the text block, so making a visual selection there and then copying will tend to lead to copying the whole text block in one go, that is there is no copy of a part of the text block. Also, making selections within a text block and attempting to copy that selection, will lead to no text being copied out at all.

In test 6 I attempted to get both correct verbatim copy-paste with ability to make selections within a text block and copy such parts, but this test failed.

@u-fischer , do you see any mistake I did in test 6, that is do you see any way to get correct verbatim copy-paste from selections within a text block? Any thoughts much appreciated, if you have no thoughts I'll presume it's impossible with PDF (at least currently).

(To discuss next: What bug reports to file to all the PDF viewers.)

@123geek
Copy link
Author

123geek commented Sep 26, 2020

Marked Content operators with ActualText inside the page stream can not be nested. If you start to add it around long pieces of texts, you would break the copy&pasting of these scripts. This may not matter for your use case, but it means that it can't be a general solution to improve copy&pasting of code. (ActualText can also be added to structure elements, but it is rather unclear how well pdf viewer support this, and how they handle nesting in such cases).

What do you mean?

Perhaps what you said now is the explanation to some of the issues we experienced in trying to get verbatim copy-paste with or without AccSupp ActualText.

Besiede this: As a user I wouldn't want to have to extract long code listings by copy&paste from a pdf, I would greatly prefer an attachment or an embedded file, that I can simply save or open in an editor, as I then can be more confident that I got the real code, and not something that got modified by the pdf viewer or the OS.)

Right, this is for snippets only, such as compilation (where a configure or build command is typically 5-20 lines), maybe in longest case a short configuration file.

File attachments are useful but also mean hassle, e.g. save the attachment to a file -> devise a location in home or temporary directory for the file -> open that file separatenly in an editor -> do select all+copy in that editor -> paste to the right location -> close the separate editor -> delete the temporary file, this is about 30 seconds extra work per individual copy step.

@123geek
Copy link
Author

123geek commented Sep 26, 2020

one AccSupp per character

That is I think the more normal use, in fact you only really need it for characters that don't have a standard unicode encoding in the font, which here is possibly just the white space, the other characters should copy naturally without this.

@davidcarlisle are you sure - per my tests above it looks like copy-paste in PDF is largely broken by default, with respect to space indentation and empty lines. My test with making an AccSupp for a whitespace failed totally, did I miss anything?

What are your thoughts about my "test 5" above as the currently optimal way of achieving verbatim copy-paste in PDF, also do you have any thought about how make the "test 6" approach work.

@u-fischer
Copy link
Contributor

Any thoughts much appreciated, if you have no thoughts I'll presume it's impossible with PD

Well it is nice that you are doing all this tests -- I don't have the time currently -- but basically you are confirming me what I thought before: that copy&pasting of code is not reliably possible, and that using ActualText/accsupp is not the right way to enable it.

pdf viewer use heuristics for copy&paste. This is quite nice as they nowadays gets the text more or less right, even if there are no real spaces, or hyphens from hyphenation. They even often even preserve some formatting and tabular. But there is no specification and you don't know what you will get after the next update. If something changes you couldn't even claim that it is a bug.

File attachments are useful but also mean hassle, e.g. save the attachment to a file

In adobe reader I can simply doubleclick on an icon and the file opens in my editor (after a security question).
And the icon even doesn't print ...

\documentclass{article}
\begin{filecontents}{test-attach.txt}
some code
\end{filecontents}
\usepackage{attachfile2,listings}

\begin{document}
\attachfile[icon=PushPin,mimetype=text/plain,print=false]{test-attach.txt}

\lstinputlisting{test-attach.txt}
\end{document}

@123geek
Copy link
Author

123geek commented Sep 26, 2020

@u-fischer do you see any way that my ”test 6” can be fixed so it works, that is some way cause visual glyphs between ActualTexts to not break the copy-paste?

I see your point that attachments clearly will be well preserved. PDF has some link to attachment function also, which makes it more convenient to open attachments isn’t it so eg the user would just click the text box and the attachment would open.

Direct copy-paste from a document still has a charm to it. If my understanding of the outcome of test 5 is correct, then I have proven that arbitrary-length verbatim copypaste is possible (in Adobe Reader DC, and other viewers too after they fix their bugs) though the whole text must be concentrated to one single location/coordinate in the PDF.

@marco-c
Copy link

marco-c commented Mar 25, 2022

It would help us if you could attach an example PDF on https://bugzilla.mozilla.org/show_bug.cgi?id=1669335.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants