Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XPS documents from print driver are terribly slow #51930

Open
wstaelens opened this issue Apr 27, 2021 · 7 comments
Open

XPS documents from print driver are terribly slow #51930

wstaelens opened this issue Apr 27, 2021 · 7 comments
Labels
Milestone

Comments

@wstaelens
Copy link

wstaelens commented Apr 27, 2021

As adviced from /runtime/wpf I should open a ticket here for the `` dotnet/runtime team... 🙄

from dotnet/wpf#4000:

Having for example a document of 3000 pages being printed to a V4 driver. Because of the very annoying STA requirement, it takes ages to render the pages sequentally. We can't render the pages in parallel (if possible in C#, feel free to explain how), in other words other code and logic that works on individual pages is unable to go parallel and is slow because it all has to go sequentially. Eventually we go out of memory as we can't hold all the rendered pages for some actions we are doing.

The performance issues can easily be reproduced with Microsoft's own XPS Viewer and Microsoft XPS Document Writer (printer). When opening the original pdf (3MB) and we print it to the Microsoft XPS Document Writer printer as an .xps, it takes ages to print. Once it has been printed we have an .xps file grown to 50MB. Opening the xps in Microsoft XPS Viewer and searching a word (which exists e.g. on page 2668) literally takes ages as it processes sequentally through the document. Sumatra finds the word in about 50 seconds, XPS Viewer does it in ±6 minutes. (to compare: foxit reader on the original pdf does it in 25 seconds).

I can't share this big file (confidential) but just take some pdf files, ebooks in pdf, with a lot of pages and print them. (or print and capture the XPS print jobs with a render filter to catch the xps on the microsoft generic V4 driver.)

Can these XPS printing issues please be tackled or prioritized?

.NET SDK 5.0.202
.NET runtime 5.0.5
Windows 10 20H2 (19042.928)
Windows Server 2019 1809 (17763.1879)

Linked tickets:

Update

A file that you can test for example:

  1. navigate to https://www.spaenhiers.be/archief and click on Databank bidprentjes or "Bidprentjes" (direct link to .pdf file is sometimes updated: https://www.spaenhiers.be/Media/Default/docs/archief_Bidprentjes_2021-04-19.pdf or https://spaenhiers.files.wordpress.com/2022/05/bidprentjes_2022-02-17.pdf or https://spaenhiers.files.wordpress.com/2022/06/bidprentjes_-2022_06_18.pdf )
  2. Print the file to Microsoft XPS Document Writer (sloooooow 🐌 🏁 🕐)
  3. You'll notice the file size is HUGE compared to the PDF file.
  4. open the .xps file in Microsoft Xps Viewer, make sure it is on the first page and search for the value 67588 or Zwertvaegher
  5. go drink a coffee, eat some pizza, drink 5 beers and return till it found it.

(and yes parsing XPS with .NET 5 is also slow and takes a lot of memory etc etc... need some performance boots so that it is faster compared to PDF documents.)

@wstaelens wstaelens added the tenet-performance Performance related issue label Apr 27, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 27, 2021
@ghost
Copy link

ghost commented Apr 28, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

As adviced from /runtime/wpf I should open a ticket here for the `` dotnet/runtime team... 🙄

from dotnet/wpf#4000:

Having for example a document of 3000 pages being printed to a V4 driver. Because of the very annoying STA requirement, it takes ages to render the pages sequentally. We can't render the pages in parallel (if possible in C#, feel free to explain how), in other words other code and logic that works on individual pages is unable to go parallel and is slow because it all has to go sequentially. Eventually we go out of memory as we can't hold all the rendered pages for some actions we are doing.

The performance issues can easily be reproduced with Microsoft's own XPS Viewer and Microsoft XPS Document Writer (printer). When opening the original pdf (3MB) and we print it to the Microsoft XPS Document Writer printer as an .xps, it takes ages to print. Once it has been printed we have an .xps file grown to 50MB. Opening the xps in Microsoft XPS Viewer and searching a word (which exists e.g. on page 2668) literally takes ages as it processes sequentally through the document. Sumatra finds the word in about 50 seconds, XPS Viewer does it in ±6 minutes. (to compare: foxit reader on the original pdf does it in 25 seconds).

I can't share this big file (confidential) but just take some pdf files, ebooks in pdf, with a lot of pages and print them. (or print and capture the XPS print jobs with a render filter to catch the xps on the microsoft generic V4 driver.)

Can these XPS printing issues please be tackled or prioritized?

.NET SDK 5.0.202
.NET runtime 5.0.5
Windows 10 20H2 (19042.928)
Windows Server 2019 1809 (17763.1879)

Linked tickets:

Update

A file that you can test for example:

  1. navigate to https://www.spaenhiers.be/archief and click on Databank bidprentjes (direct link to .pdf file is sometimes updated: https://www.spaenhiers.be/Media/Default/docs/archief_Bidprentjes_2021-04-19.pdf )
  2. Print the file to Microsoft XPS Document Writer (sloooooow 🐌 🏁 🕐)
  3. You'll notice the file size is HUGE compared to the PDF file.
  4. open the .xps file in Microsoft Xps Viewer, make sure it is on the first page and search for the value 67588 or Zwertvaegher
  5. go drink a coffee, eat some pizza, drink 5 beers and return till it found it.

(and yes parsing XPS with .NET 5 is also slow and takes a lot of memory etc etc... need some performance boots so that it is faster compared to PDF documents.)

Author: wstaelens
Assignees: -
Labels:

area-System.IO.Compression, tenet-performance, untriaged

Milestone: -

@danmoseley
Copy link
Member

Thanks for the report @wstaelens . You mention XPS Reader so I'm guessing this problem is the same on .NET Framework?

Do you have an interest in debugging/investigating? Realistically that is the most likely way a fix would get in this release.

@wstaelens
Copy link
Author

@danmoseley yes I mentioned the XPS Viewer I mentioned was just to compare for example XPS Viewer with SumatraPDF (also capable of viewing XPS documents).

Try to search for example something in XPS Viewer and do the same in SumatraPDF (e.g. in a 3000+ page document). You'll notice the difference (e.g. ±50seconds in sumatra compared to ±6 minutes).
So generally I believe that when the MS team just profiles the code and maybe has some possibilities to update the code base that in terms of performance, memory allocations big steps can be taken for .NET 5 / .NET Core (and .NET Framework 4.8)

We are willing to help but it is hard to say what is exactly slow as the code that parses/generates/... the XPS files in a XPS print driver (XPSDrv) is Microsoft internal. We only capture the generated .xps. So I don't think I'll be a great help here... We believe it is a Microsoft internal thing. When we further process the XPS (compared to e.g. first converting it to PDF or just using EMF) the PDF/EMF format is faster, more optimized, takes less disk space (not true for EMF) and doesn't have the annoying STA-requirement like XPS.

Because XPS with .piece files doesn't seem to be supported in .NET 5 / .NET Core I expect that this might also be a reason that code base differs or that not everything has been implemented.

In general XPS is slow for printing, and producing/consuming XPS files takes up much more disk space and consumes more memory compared to other technologies. We even heavily considered going back to EMF for this (!!).
The format in XML is clean, but XML and the parsing of XML is yeah... let's say we would like to see improvements. We see an increase in XPS usage, so please don't turn it down this time.

@jozkee jozkee removed the untriaged New issue has not been triaged by the area owner label Jun 23, 2021
@jozkee jozkee added this to the Future milestone Jun 23, 2021
@wstaelens
Copy link
Author

👋

@wstaelens
Copy link
Author

Hey, any performance updates?

@znakeeye
Copy link

I'm rendering an XpsDocument to a MemoryStream which is then converted to a PDF on disk. After some research, I found two things which significantly impact performance.

  1. DynamicResource kills performance. Completely! 😆
  2. Package compression takes time. Use CompressionOption.NotCompressed (or CompressionOption.SuperFast if you really need compression).

DynamicResource performance problem
Usually DynamicResource has similar performance characteristics as StaticResource. At least there seems to be a consensus in the community, that this is the case. But it certainly does not hold true for Xps!

After profiling my Xps generator, it became apparent that FindWeakReference() was a very, very hot path. See issue #4468. Also, please consider prioritizing PR 5610 from @batzen, as it aims to fix this very problem.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants