Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java flavor #54

Closed
moriak opened this issue Jun 25, 2014 · 23 comments
Closed

Java flavor #54

moriak opened this issue Jun 25, 2014 · 23 comments

Comments

@moriak
Copy link

moriak commented Jun 25, 2014

Any change of support for java?

Thanks for the great job.

@firasdib
Copy link
Owner

Still no success :(... I could host it on a server, but that would defeat the purpose.

@Grief
Copy link

Grief commented Aug 12, 2014

I just want to leave +1 for that feaure request as your online tool is the coolest tool for regular expressions I ever seen. I'd love to see the ability to construct regexps for java in it.

@Supuhstar
Copy link

+1 I do a lot of regexing in Java and this would be nice to have. Also, Ruby and .NET if possible :3

Donated hopefully <3

@Unihedro
Copy link

+1 for idea. The differences between Java and PCRE is that several constructs are absent ((|, (?(condition)), presence of intersection classes and character set are presented differently (\p{Upper}, \P{Upper}), really. Is this possible by basing off Perl?

As mentioned by the official documentation:

The Pattern engine performs traditional NFA-based matching with ordered alternation as occurs in Perl 5.
Perl constructs not supported by this class: [...]
Constructs supported by this class but not by Perl: [...]
Notable differences from Perl: [...]

Also, idea for code generation:

Pattern re = Pattern.compile( /* Pattern here with literals escaped */);
String str = /* String here */;
Matcher obj = re.matcher(str);
while (obj.find())
    System.out.println(obj.group());

@firasdib
Copy link
Owner

@Vincentyification I could create a syntax parser for Java and use the PCRE engine underneath. It's probably not going to work identically to Java, but pretty damn close and it will check the syntax for people.

@Unihedro
Copy link

@firasdib That's a great idea, and possibly close to where I was originally going. Since the Java Pattern syntax doesn't change frequently (In fact, since it was established on Java 5 it never changed), any maintainable implementation would work for this case. I thought of this while searching for a Java library which could parse PCRE, but got sidetracked seeing http://regex101.com having an exclusive bullet point documented by Wikipedia under "Links" section. I thought since Java syntax is based on Perl, a short regex could be used to check for invalid syntaxes that wouldn't apply in Java, and converting Java-style character classes to Perl. [[:hex:]] -> \p{XDigit}, et cetera. As for the class intersections, I think you could get away by converting them to positive / negative lookaheads and faking the relevant sections in the "regex debugger". As for the related ticket about Ruby regex support, there's a bigger issue in the syntax similarities, so I'm only suggesting a parser for Java input for now.

@FabianFrank
Copy link

+1

@Khanna111
Copy link

+1. I missed out on this post and created a new one. Please disregard that.

Thanks.

@nhahtdh
Copy link
Collaborator

nhahtdh commented Jan 13, 2015

@Vincentyification: There are lots of f*ck up things in the reference implementation (Oracle and OpenJDK) of Java regex that wouldn't present in PCRE. For example: negation of negation of character class, definition of \b not based on \w, capturing group manage to capture stuffs in negative look-around, incorrect retention of captured content, integer overflow in length calculation to limit look-behind, high stack usage leading to StackOverflowError for greedy repetition, ... (CANON_EQ not mentioned here, since it is a buggy and unusable feature). It will cause confusion for user if you use PCRE implementation here.

@Supuhstar
Copy link

@nhahtdh +1 Great points!

@asmaier
Copy link

asmaier commented Mar 18, 2015

+1 for explicit Java support

@niccottrell
Copy link

Support for intersections in Java would be great: http://www.regular-expressions.info/charclassintersect.html (Just made a small donation re this feature)

@Claudweb
Copy link

Claudweb commented Apr 7, 2015

+1 please add this

@mrslain
Copy link

mrslain commented Mar 20, 2016

👍

@firasdib
Copy link
Owner

The new release will feature a code generator for Java which might help some, alteast with those pesky backslashes. The PCRE engine is still as close as you're going to get, which is likely going to cover it for most of your cases. But so far I have not been able to get the actual engine running. Sorry!

@Zarthus Zarthus changed the title Java support java flavor Mar 30, 2017
@Zarthus Zarthus changed the title java flavor Java flavor Mar 30, 2017
@Doqnach Doqnach mentioned this issue Aug 30, 2019
@xenoterracide
Copy link

me too

@simonlindberg
Copy link

Hi everybody! I just managed to compile the Java regex engine to javascript by taking the engine from the openjdk (java.lang.Pattern, java.lang.Matcher, etc.) and running TeaVM on it. I've sent my work to @firasdib, so hopefully he'll get it working soon 🙏 😄

It's currently based on java 8, does anybody know if there is any relevant changes made to the engine since then?

@simonlindberg
Copy link

@OlivierJaquemet Looked at a few and most of them seems to be performance related or regressions i later version. I did find one that might be related, https://bugs.openjdk.java.net/browse/JDK-6609854 , might be more like it.
But since Java 8 still is the most popular version I think in that case there should be multiple java flavors available (all the LTSs perhaps?).
As long as the version is clearly stated I think only Java 8 is a good start! :)

@firasdib
Copy link
Owner

Thanks to the work by @simonlindberg, you can expect regex101 to support java shortly! Keep a close eye on the website the coming days...

@Gamebuster19901
Copy link

Gamebuster19901 commented Nov 23, 2022

Can this be updated for post java 9? Should I create a new issue?

I'm having issues debugging why my regex is matching everything except for the final match which regex101 says it should be matching.

My regex: (?<amount>[\+\-]?\d*)(?<die>d\d+){0,1}(?<type>(?>(?!d\d+)[\w\s])*)

Regex101 states that the string d4d4d4 should match three times:

image

But in java it only matches twice (video below):

https://www.youtube.com/watch?v=3n-h60_WWjs

@Unihedro
Copy link

Maybe don't? Your regex is fine (if you want empty strings, since your regex also matches that, so it technically matches 4 times, not 3 (on JDK12):

image

The problem is not on regex101 but your code. Likely just fix how you're parsing the string and it should be good.

@Gamebuster19901
Copy link

Gamebuster19901 commented Nov 27, 2022

Apparently I'm either misusing Matcher.hitEnd(), or Matcher.hitEnd() is just broken.

https://www.tutorialspoint.com/javaregex/javaregex_matcher_hitend.htm

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TestMain {

	static final String test = "d4d4d4";
	static final Pattern regex = Pattern.compile("(?<amount>[\\+\\-]?\\d*)(?<die>d\\d+){0,1}(?<type>(?>(?!d\\d+)[\\w\\s])*)");
	static final Pattern regex2 = Pattern.compile("d4");
	
	public static void main(String[] args) {
		Matcher matcher = regex.matcher(test);
		do {
			matcher.find();
			System.out.println(matcher.group("die") + " hitEnd: " + matcher.hitEnd());
		}
		while(!matcher.hitEnd());
		
		
		System.out.println("==============");
		
		
		matcher = regex2.matcher(test);
		do {
			matcher.find();
			System.out.println(matcher.group() + " hitEnd: " + matcher.hitEnd());
		}
		while(!matcher.hitEnd());
	}
	
}

Which gives the output:

d4 hitEnd: false
d4 hitEnd: true
==============
d4 hitEnd: false
d4 hitEnd: false
d4 hitEnd: false
Exception in thread "main" java.lang.IllegalStateException: No match found
	at java.base/java.util.regex.Matcher.group(Matcher.java:644)
	at java.base/java.util.regex.Matcher.group(Matcher.java:603)
	at TestMain.main(TestMain.java:25)

In the first example, hitEnd() returns prematurely, and in the second example, it never returns true. Either way, I now know this isn't a Regex101 problem.

Do you think this is a JDK issue, or am I just using hitEnd() incorrectly (or both)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests