Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let JarFileLocation work with custom ClassLoader URIs #1131

Merged
merged 6 commits into from
Aug 9, 2023

Conversation

odrotbohm
Copy link
Contributor

IMPORTANT: I did not add any new test cases to verify the scenario described in the following, as I was unable to get the project imported into my IDE. The existing tests still succeed so that I feel confident I didn't break anything. Furthermore, a downstream sample project producing the described scenario works with the fix applied and previously failed.

Some ClassLoaders that work with repackaged JAR files return custom resource URIs to indicate custom class loading locations. For example, the ClassLoader in a packaged Spring Boot's returns the following URI for source package named example: jar:file:/Path/to/my.jar!/BOOT-INF/classes!/example/. Note the second "!/" to indicate a classpath root.

Prior to this commit, JarFileLocation was splitting paths to a resource at the first "!/" assuming the remainder of the string would depict the actual resource path. That remainder potentially containing a further "!/" would prevent the JAR entry matching in FromJar.classFilesBeneath(…) as the entries themselves do not contain the exclamation mark.

This commit changes the treatment of the URI in JarFileLocation to rather use the last "!/" as splitting point so that the remainder is a proper path within the ClassLoader and the matching in FromJar.classFilesBeneath(…) works properly.

@odrotbohm
Copy link
Contributor Author

@codecholeric @hankem – This was originally reported against Spring Modulith in spring-projects/spring-modulith#221. The fix would be crucial to get our runtime support working. I'd appreciate any estimations on a release including that fix (potentially rather a 1.0.2 than a 1.1?) as we find ourselves blocked from releasing Spring Modulith 1.0 RC1.

@odrotbohm odrotbohm changed the title Let JarFileLocation work with custom ClassLoader URIs. Let JarFileLocation work with custom ClassLoader URIs Jul 4, 2023
@odrotbohm odrotbohm changed the title Let JarFileLocation work with custom ClassLoader URIs Let JarFileLocation work with custom ClassLoader URIs Jul 4, 2023
odrotbohm added a commit to spring-projects/spring-modulith that referenced this pull request Jul 20, 2023
It includes the fix submitted for TNG/ArchUnit#1131. To be removed once we can upgrade to a released version of the fix.
@codecholeric
Copy link
Collaborator

Hey @odrotbohm, again, sorry that it took me so long right now 🙈 My problem was that I couldn't really overview the whole impact, because I never anticipated multiple separators !/ within the ClassLoader URLs. Thus, I wanted to be careful that I really understand this and the impact.
To understand it I decided to add some real "3rd party tests" for this, using the Spring Boot ClassLoader, but then my journey down the rabbit hole began 😬 I didn't know you could customize URL handling that heavily (and transparently), and once I saw that there was a custom Spring JarFile, JarURLConnection, URLStreamHandler and so on I started to ponder if this should even go into core ArchUnit, or if it would be a cleaner approach to define some extension point that Modulith could implement to add this custom URL handling in any way that makes sense. I'm still not completely sure about this, but I figured that using the last separator !/ for now doesn't destroy anything and if it fixes the problem at hand, then that's an okay compromise between "being clean" and being pragmatic 🤷

Anyway, there is one case that currently doesn't work though (I've documented this in a test). If the directory entries within the JAR file are missing, then the respective classes within a certain package won't be found correctly. I had this problem in the past (e.g. the old rt.jar decided to leave out the directory entries). Back then I implemented some workaround to scan through all the JARs on the classpath as well and look for entries that start with the respective prefix. But this won't work in this case, because without custom knowledge about the Spring Boot JAR layout it's impossible to know that entries need to be taken relative to BOOT-INF/classes only. And that IMHO crosses the point where it should really be a framework dependent extension point, if we want to support this, too.
But since I don't even know if there ever are Spring Boot JARs out there that miss the directory entries (e.g. if the directory entry BOOT-INF/classes/ is missing AFAIS deriving the nested archive crashes anyway), I'm also fine to add it like that for now and see if there ever will be a follow up issue. Except if you with your background knowledge about Spring can already tell that this will definitely be a problem, then we can ponder if we improve this right away. But then also maybe in a follow up PR so the problem at hand is fixed at least.

Anyway, I tried to make some improvements to the test infrastructure on the way and then added some more tests, I'd be happy if you can take a look if this makes sense to you. The last commit I would squash onto your's if that's alright, since it adds extra tests for the concrete fix.

In any case, I think it would make sense to add to your commit message also a note, that using

jar:file:/foo/bar.jar!/BOOT-INF/classes!/

as a root JAR URL (where originally only a real root URL of a JAR file was supported, e.g. jar:file:/foo/bar.jar) works because of the custom Spring Boot URLStreamHandler magic. Because that was the most puzzling for me in the beginning, why just using this "composite" JAR URL as new URL(..) would suddenly be able to derive the nested archive correctly as ClassFileSource 😂 Took me quite some time to find that piece of the puzzle.

@codecholeric
Copy link
Collaborator

PS: Aaaand I see that my tests of course fail on some platforms 🤦‍♂️ URL handling is so much fun to do cross-platform and across all JDK versions 😬 Gonna look into that tomorrow! I think you can still review it if it generally makes sense from your point of view...

@codecholeric codecholeric force-pushed the fix-jar-class-lookup branch 2 times, most recently from 730066b to 912f1f1 Compare August 6, 2023 10:07
@codecholeric
Copy link
Collaborator

I think by looking at the test fails I actually found a way to fix the "classes in packages can't be found if there are no directory zip entries" problem 🙂 Because, there was a "legacy way" that actually looked through the context for URLClassLoaders and considered all their URLs as part of the classpath. For some reason I removed this behavior >= JDK 9 (I think it was only because I thought the URLClassLoader would lose it's wide usage post JDK 8 and it wouldn't make a difference anyway cause the classpath property typically has the same URLs).
But with this behavior in place we actually obtain URLs like jar:file:/some/path.jar!/BOOT-INF/classes!/ from the Spring Boot ClassLoader, and these again with the custom Spring Boot URL handling in place will stream their entries correctly so that also classes in packages without directory zip entry will then be found 🥳 (at least that's what the tests so far suggest to me...)
Anyway, let me know what you think! If all this makes sense for you I would rearrange the commit history a little once again to put the legacy behavior in action for JDK > 8 in a separate commit and with a clear explanation why.

@odrotbohm
Copy link
Contributor Author

I know too little about the way Boot sets up the class loading in the fat JARs they create. I've asked @wilkinsona to comment, but am not sure when he's going to find time.

Personally, I don't have any objectives except getting ArchUnit to work with classes in a running Spring Boot application. Given that this particular context should be a rather rare case for now (and only affects projects using Spring Modulith's runtime support), I agree that the focus should be on making sure that the currently already working scenarios still continue to work. In other words: if we do not catch all corner cases of the “run ArchUnit within a Boot app” is just fine.

@wilkinsona
Copy link

The changes here look good to me. I can't see anything that's Spring Boot specific in the main code and that's the way it should be.

Regarding jars without directory entries, that can happen with Spring Boot. It shouldn't happen with the "outer" jar as it's created by our build plugins and they will create directory entries. It may happen with jars nested in BOOT-INF/lib, however, as we do not have any control over their contents.

The code in Spring Framework's PathMatchingResourcePatternResolver may be of interest here. For example, its addAllClassLoaderJarRoots method is similar to the changes that are proposed here for working with URLClassLoader and finding jars on the classpath.

@codecholeric
Copy link
Collaborator

Thanks for the feedback, then I'm gonna get this ready and create a bugfix release!

codecholeric and others added 6 commits August 9, 2023 01:07
According to the ZIP spec (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT -> 4.4.17.1) file entries in a ZIP file must not start with a leading slash '/'. For tests that purely want to iterate the entries, this problem did not manifest in any way (there is no exception, etc., if an entry starts with '/'). But if such an illegal JAR is used within an `URLClassLoader` it doesn't work correctly and the classes can't be discovered.

Signed-off-by: Peter Gafert <[email protected]>
One way how we tackle scanning class files from the classpath is asking the context `ClassLoader` for `getResources(..)`. We now add an explicit test for this. We also document by a test that `getResources(..)` doesn't do what we want if the directory entries are missing from a JAR and we use some `ClassLoader` derived from `URLClassLoader`. When creating JARs we can choose if we want to add ZIP entries for the folders as well or skip them and only add entries for the actual class files. But in case we're not adding those directory entries, any `URLClassLoader.getResources(..)` will return an empty result when asked for this directory. This unfortunately makes the behavior quite inconsistent. We have some mitigation in place to also analyze the classpath and scan through the JARs on the classpath with a prefix logic that ignores if the entries for the directory are present. But in case we really only have a customized `ClassLoader` without any directory entries in a JAR there is not much the `ClassLoader` API allows to do.

Signed-off-by: Peter Gafert <[email protected]>
It is possible to quite heavily customize URL handling by extension points like the system property `java.protocol.handler.pkgs`. Thus, it's safer to try to limit manual URI manipulations and file creations as much as possible and derive it from objects that participate in this customization. E.g. use the (possibly customized) `JarURLConnection` obtained from the URL to retrieve the `JarFile` instead of creating a new `File` from the URL and converting this to a `JarFile` again.

Signed-off-by: Peter Gafert <[email protected]>
Back when JDK 9 support was implemented I assumed that the `URLClassLoader` would lose some of its omnipresent occurrence and thus dropped retrieving URLs from all `URLClassLoader`s in context (I also assumed that the content of the system property `java.class.path` would coincide with those URLs anyway in all cases where ArchUnit is used).
However, for once `URLClassLoader` is still widely used today and secondly retrieving the URLs from such `ClassLoader`s makes a real difference as soon as they are customized like Spring Boot's `ClassLoader`. Because in those cases the URLs from the classpath system property can differ quite a bit from the ones returned by the custom `URLClassLoader`. In Spring Boot's case it can e.g. return nested archive URLs from fat JAR URLs, which are then also resolved correctly when opening the stream by custom URL stream handling. In any case, it makes sense to restore the legacy behavior for JDK < 9 also for all JDKs >= 9 and take the URLs from `URLClassLoader`s into account, so we can support such customized cases better.

Signed-off-by: Peter Gafert <[email protected]>
We use different approaches in different places to obtain base, path or both from a JAR URI (i.e. the part up to the separator denoting where the JAR resides versus the part after the separator denoting the path within the JAR file). We now unify this to make the connection clearer in code and use simple character splitting instead of Regex.

Signed-off-by: Peter Gafert <[email protected]>
Some ClassLoaders that work with repackaged JAR files return custom resource URIs to indicate custom class loading locations. For example, the ClassLoader in a packaged Spring Boot's returns the following URI for source package named example: jar:file:/Path/to/my.jar!/BOOT-INF/classes!/example/. Note the second "!/" to indicate a classpath root.

Prior to this commit, JarFileLocation was splitting paths to a resource at the first "!/" assuming the remainder of the string would depict the actual resource path. That remainder potentially containing a further "!/" would prevent the JAR entry matching in FromJar.classFilesBeneath(…) as the entries themselves do not contain the exclamation mark.

This commit changes the treatment of the URI in JarFileLocation to rather use the *last* "!/" as splitting point so that the remainder is a proper path within the ClassLoader and the matching in FromJar.classFilesBeneath(…) works properly.

Note that in composition with custom `ClassLoader`s frameworks like Spring Boot can also install custom URL handling. In this case ArchUnit can read a class file from a nested archive URL like `jar:file:/some/file.jar!/BOOT-INF/classes!/...` using the standard `URL#openStream()` method in a completely transparent way (compare setup in `SpringLocationsTest`).

Signed-off-by: Oliver Drotbohm <[email protected]>
@codecholeric
Copy link
Collaborator

Okay, as a last test I've now added a JAR without directory entries to the BOOT-INF/lib folder and that class also could be found correctly. So I would expect it's at least a reasonable shot now 😁 (since it's URL handling I'm already waiting for the first "but on platform x with version y" bug report, but hey... One issue at a time 😁)

@codecholeric codecholeric merged commit 1781127 into TNG:main Aug 9, 2023
21 checks passed
@codecholeric
Copy link
Collaborator

Just FYI: I decided to just release 1.1.0 with this fix. Because some stuff has accumulated meanwhile and I don't think it makes sense to separate this for a bug fix release (the other changes are either also bug fixes or "low danger" improvements)

@odrotbohm
Copy link
Contributor Author

Confirming 1.1.0 solves the original problem! Thanks, Pete! 🥳🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants