Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

for #196 - custom UserAgent / PdfBoxUserAgent #198

Closed
wants to merge 1 commit into from
Closed

for #196 - custom UserAgent / PdfBoxUserAgent #198

wants to merge 1 commit into from

Conversation

delafer
Copy link

@delafer delafer commented Apr 8, 2018

Factory to set custom PdfBoxUserAgents

Copy link
Contributor

@rototor rototor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PullRequest, but the PdfBoUserAgentFactory is a bad design. No singletons please. Sorry.

Singletons with thread locals just cry for class loader leaks in servlet environments. Thats really hard to get right. There is a reason why your application code should just use guice, spring or maybe even dagger for singletons (and all other injections of course). Singletons should only be @Singleton annotated and injected classes.

Also this breaks the API contract of the builder and does not fit the current design. As the builder should only local store settings for the PdfBoxRenderer, but not expose global all builder affecting objects....

It's nice that you try to improve the library, and I am really sorry that I must criticize you here.


/**
* PdfBoxUserAgentFactory. Singleton.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no! Please no singletons....

Copy link
Author

@delafer delafer Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Singleton & Multiton patterns appeared a long before Guice, Spring iOC, Weld, Dagger, etc.
"Singletons should only be @singleton annotated and injected classes."
who has said such a stupid thing?
Singletons should only be @singleton annotated in CDI / DI Enviroments, where dependency injection container is used, without CDI container, you can still use and write Singletons and where are normal classes which have static fields-like behavior.

"does not fit the current design"
LOL. I do not see any design at all. (PdfBoxRenderer = 18 args constructor + 108 lines of code in constructor init method. wonderful design. I am working for about 20 years as developer and have rarely seen such a wonderful design)

return LazyHolder.INSTANCE;
}

private ThreadLocal<UserAgentFactory> threadLocal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This smells like memory / class loader leaks in Tomcat etc. on WAR unloading... ThreadLocals are hard to do right in a servlet context.

Copy link
Author

@delafer delafer Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have never had any problems with ThreadLocal clases either Tomcat, Jetty, OC4J, Wildfly, Weblogic, etc. I can list / enumerate dozens of well-known / opensource projects using this standard Java class. It's available since JAVA 1.2 ;-) ThreadLocal could lead to memory leaks, but only if you don't know how to use it.

If only this library (openhtmltopdf) was developed using a thoughtful design it would not be necessary at all to create something like this. I almost threw up as I have seen constructor with 18 arguments and constructor itself has 108 lines of code! schoolchildren writing better code.

* UnRegister [threadLocal] own / custom PdfBoxUserAgentFactory
*/
public void unregisterPdfBoxUserAgentFactory() {
threadLocal.remove();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you forget this, you will have a class loader leak in your servlet container on unload....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just like with streams, if you opened it - you must close it. It is not an argument not to use them at all. Tomcat trying to reuse user threads, which could lead to problems with threadlocal ,what why tomcat developers recommends to clean them manually. Anyway tomcat was never a good choice for a production use in a serious projects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This library can / will be used in whatever context the users want. Also it might be used with Tomcat (or Jetty, which also reuses threads). If we already know that using ThreadLocals will cause problems if the user is not careful we should avoid it - especially in the public API. The goal should always be to design an API in such a way that the user can not shoot himself into the foot. Not everyone has the time / is willing to read the documentation carefully. Being surprised by some "out of PermGen" errors in production is not so funny - especially if you don't have any clue where this could come from, as you usually have may different libraries in use.

* @return PdfBoxUserAgentFactory instance
*/
public PdfBoxUserAgentFactory getPdfBoxUserAgentFactory() {
return PdfBoxUserAgentFactory.instance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats bad API design. It also breaks the builder contract (i.e. everything but build() always returns PdfRendererBuilder for method chaining). Better would be

  • Just define the UserAgentFactory interface (as above)
  • The builder should have a field userAgentFactory with a default implementation assign i.e.
        _userAgentFactory = new UserAgentFactory() {
            public PdfBoxUserAgent getPdfBoxUserAgent(PdfBoxOutputDevice outputDevice) {
                return new PdfBoxUserAgent(outputDevice);
            }
        };
  • A setter to override the default implementation. i.e.
public PdfRendererBuilder setUserAgentFactory(UserAgentFactory customFactory) {
  this._userAgentFactory = customFactory;
  return this;
}

Of course it would require an additional parameter to the PdfBoxRenderer....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suspicion that you just did not understand the code above.
"build() always returns PdfRendererBuilder for method chaining"
Of course it returns always a PdfRendererBuilder, it should be used in this manner:

    PdfBoxRenderer renderer = null;
    try {
        builder.getPdfBoxUserAgentFactory().registerGlobalPdfBoxUserAgentFactory(new PdfBoxUserAgentFactory.UserAgentFactory() {
            @Override
            public PdfBoxUserAgent getPdfBoxUserAgent(PdfBoxOutputDevice pdfBoxOutputDevice) {
                return new ExtendedPdfBoxUserAgent(pdfBoxOutputDevice);
            }
        });
        renderer = builder.buildPdfRenderer();
        //customUserAgent(renderer);
        renderer.layout();
        renderer.createPDF();
    } finally {
        if (renderer != null)
            renderer.close();
    }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@delafer Sorry, but where is the unregisterPdfBoxUserAgentFactory() call? Oh, you forgot it. So a class loader leak is incoming on WAR reload ... This is what I mean by bad API design, if it's easy to forget something you will forget to do it... and the problem will only manifest after the n-th reload of the WAR..

Yes, creating an idiot proof API is hard, but should be the goal never the less.

Method chaining means: builder.setXY().setAbc().setWhatever().build(). For another example look here: https://google.github.io/gson/apidocs/com/google/gson/GsonBuilder.html

@rototor
Copy link
Contributor

rototor commented Apr 8, 2018

FYI: If you want to get a guess what all strange kind of problems you get with singletons and ThreadLocals just read the documentation about this in the Finalzer class of Guava:

https://github.com/google/guava/blob/master/guava/src/com/google/common/base/internal/Finalizer.java

@rototor
Copy link
Contributor

rototor commented Apr 8, 2018

Hmm, the more I think about this, the more I don't like it to let the user override the UserAgent. In FlyingSourcer there was no clean API to do things, you were more or less forced to directly override internals of the library. There were even multiple ways to do the same thing.

In OpenHTMLToPDF we try to provide a clean API (namely the *Builders and *Renderers) to customize all stuff you need to customize. The UserAgent is way to internal to be overridden by the user. If you need some functionality which would require you in FlyingSourcer to override the UserAgent we should provide a clean way with the Builder.

In terms of Java 9+ modules: Only PdfRendererBuilder and PdfBoxRenderer should be accessible, same for the Java2D-classes. Of course, all interfaces etc you can give to the Builders like the FSObjectDrawerFactory, FSDOMMutator and so on must be public and accessible too. Everything else is internal and should not be accessible to outside of the OpenHTMLToPDF library itself. Also openhtmltopdf-core should mostly only be accessible by openhtmltopdf-pdfbox and openhtmltopdf-java2d (in Java 9+ modules speak, with Java 8 you can not really restrict this beside using the very complicated OSGi).

Putting such hard restrictions on the accessibility of library internals would allow us to later refactor many internals without breaking user code. What do you think about this @danfickle?

For this reason I NAK the overall idea of this pull request.

@delafer
Copy link
Author

delafer commented Apr 9, 2018

I think that you have sharply reacted to my critic. And that's why they responded reciprocally, beginning to criticize my proposal. This is unproductive. I assumed that my pull-request would be rejected. Therefore, I'm not surprised. Programmers do not like criticism. Most people believe that they write nice well-designed and thoughtful code. I don't want to debate who "has bigger balls". Of course you, as a owners of the project. in theory this open-source project, in real life this project is developed by 2 people with a very strange ideas about software / library design.

@delafer
Copy link
Author

delafer commented Apr 9, 2018

Hmm, the more I think about this, the more I don't like it to let the user override the UserAgent. In FlyingSourcer there was no clean API to do things, you were more or less forced to directly override internals of the library. There were even multiple ways to do the same thing.

In OpenHTMLToPDF we try to provide a clean API (namely the *Builders and *Renderers) to customize all stuff you need to customize. The UserAgent is way to internal to be overridden by the user. If you need some functionality which would require you in FlyingSourcer to override the UserAgent we should provide a clean way with the Builder.

if the resource in the html template does not starts with the protocol name and instead of http://www..xyz.jpg defined not a absolute, but relative path e.G. url="/xyz/myImage.jpg" new library throws NPE exception, flyingsaurcer works without exceptions. this is only one example of a dozen others, where openhtmltopdf falls with NLP exceptions. You have may be a good common ideas, but Implementation leaves much to be desired.

P.S. and think about such thing. I have specially added only 1 method with 1 line of code and changed another 1 line of code and no more. I added my own class, but I tried to leave the library and its code in its original form. No refactoring. No global changes. I have left even a monster constructor with 18 arguments, anyway my pull request will be most likely rejected... ) about extensibility / expandability - this library is almost not expandable.. and not because API, but IMHO: because of heavily use of structured programming patterns instead of classic OOP.

e.G. I'd like to load resources not even from classpath, but from DB / database or through custom, not implemented in java protocol like SFTP.. With flying saurcer I can write a few lines of code to extend default UserAgent, but in openhtmltopdf you have only URL resolver. Could you suggest any solution how to load resource via URL from DB ? ) or password protected resource, or something else ...

@rototor
Copy link
Contributor

rototor commented Apr 9, 2018

@delafer I am not the owner of the project, @danfickle is. Regarding the constructor, I've created #199. Yes, that constructor is code smell - but a private internal API anyway...

If you have some stuff which worked without problems in FlyingSaucer but crashes here you should open issues specific to those problems. The code sample I posted in #196 is for resolving absolute / relative URLs without protocol on the class path. Currently OpenHTMLToPDF does not resolve files in the class path itself. It could use Thread.getContextClassLoader() for this, should maybe even work in a container context.

I do not want to argue with you. This is an open source project were people work on this in their free time for FREE. If you don't like they way the project develops you don't need to use it. Or you can fork it and develop it the way you think it would be better.

@danfickle
Copy link
Owner

OK, firstly, let’s keep it nice and professional.

Secondly, thanks @delafer for your contribution. I broadly agree with most of your critique, I have to focus more on refactoring and testing than adding ever more features. One of the problems with a project of this size with very few tests, refactoring is fraught with danger. In hindsight I should have written more tests at the beginning.

In regard to the npe on uri resolving, it should work provided null isn’t passed in as the base uri. If it isn’t please post an issue with minimal example code.

In regard to this pull-request, while I try to accept most prs, I think that overriding the user agent is not a good idea. It is currently too messy and therefore unstable, so would break user code too often (when we changed the interface). For example, the caching system needs a big overhaul.

Now there are two ways we could enable your very valid use-case. We could refactor the pluggable Http/s stream resolver and make it for all protocols. Ie, the user passes in protocol(s) and a stream factory to use for that protocol. Alternatively, we could bring forward the cache work. As detailed in #170 the multi threaded cache would be backed by a concurrent hash map and store a uri to byte array mapping. It could be pre loaded with whatever the user required.

What does everyone think of these options?

Again, let’s keep it nice.

@danfickle
Copy link
Owner

I’m going to close this pr, but feel free to keep civil discussion in this thread or on #196

@danfickle danfickle closed this Apr 9, 2018
@rototor
Copy link
Contributor

rototor commented Apr 9, 2018

Regarding custom protocols I would suggest an API like this:

interface FSUrlOpener {
   InputStream openStream(URL url) throws IOException;
   Reader openReader(URL url) throws IOException;
}

with a default implementation like this:

public final class DefaultUrlOpener implements FSUrlOpener {
  ... _streamFactory;
   InputStream openStream(URL url) throws IOException {
       	if (url.getProtocol().equalsIgnoreCase("http") ||
		url.getProtocol().equalsIgnoreCase("https")) {
		return _streamFactory.getUrl(uri).getStream();
	}
       return url.openStream();
   }

   Reader openReader(URL url) throws IOException {
	if (url.getProtocol().equalsIgnoreCase("http") ||
		url.getProtocol().equalsIgnoreCase("https")) {
		return _streamFactory.getUrl(uri).getReader();
	}
       InputStream is = url.openStream();
       return is == null ? null : new InputStreamReader(is, "UTF-8");
   }
}

and an builder method to define custom UrlOpener:

builder.useUrlOpener(new FSUrlOpener(){
  // ... implement both methods and inspect url.getProtocol() to see what kind of transport you need.
  // You can always call url.openStream() if you don't care about a protocol and 
  // just want it to work (e.g. for class path resources)
});

If and how we cache the data returned here is an implementation detail, which we can change later.

(Regarding DefaultUrlOpener being final: It requires the streamFactory, so a user deriving from DefaultUrlOpener would be forced to provide some streamFactory even if he does not care. I am not happy with that, better ideas are welcome)

So a user could customize how the URIs are built using useUriResolver() and customize how this URIs are opened using useUrlOpener().

@danfickle
Copy link
Owner

@rototor Just saw this, but my implementation turned out similar! Feel free to suggest changes. Thanks.

@rototor
Copy link
Contributor

rototor commented Apr 9, 2018

@danfickle I think that is fine. I would just suggest renaming HttpStream(...) to FSStream(...) as it's not HTTP only anymore.

@delafer
Copy link
Author

delafer commented Apr 9, 2018

4ad3f0dc9b8db3eb0e3350d731dd7a24_400x400 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants