Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The current Java version of this library has a limitation where it fails to recognize URLs containing Unicode characters. This is despite the fact that such URLs are supported by browsers and can be registered and used effectively. For instance, URLs like "http://www.詹姆斯.com/詹姆斯" are not identified as valid URLs. This issue arises from Java's inability to recognize Unicode characters as valid components in domain names and paths..
Solution
To address this issue, I have enhanced the regular expressions used for URL validation in the Java code. Specifically, I have incorporated the Unicode regex \p{L} and \p{M} into the regular expressions that validate the domain name and path of the URL. This modification ensures that the library can now correctly identify and validate URLs containing Unicode characters.
Result
With these changes, the library can now correctly identify URLs that include Unicode characters in their domain name or path as valid URLs. For example, a URL like "http://www.詹姆斯.com/詹姆斯" will now be correctly identified as a valid URL. This enhancement broadens the range of URLs that the library can recognize and validate, aligning it more closely with the behavior of modern web browsers.