-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EQL: Remove pattern == "wild*card"
#62651
Comments
Pinging @elastic/es-ql (:Query Languages/EQL) |
If I were to take a SWAG, I would say approximately 95%+ of queries use this behavior currently. |
I got some more stats
The ease of wildcards was intentionally baked into EQL from day one, so I think we should approach it very carefully before we make plans to change this behavior. And it could be good to share some context to better understand the expectations and desires of EQL users and rule writers. When writing detections, you often don't have the full text and know exactly what you're looking for, so wildcards are important. Wildcards are so prevalent that we deliberately made them first class, letting I think it shows success that we've made it this long without needing to express a literal asterisk in a string comparison. All of these concerns have been hypothetical, and that really goes to show the value of and asterisk as a wildcard character. And we aren't the only database/query language to recognize how its importance outweighs the utility of a literal asterisk. Addressing some of the side-affects you mentioned:
There are workarounds to do this, but I will concede that they aren't ideal. One example:
Agreed, but I'm not sure how this is a unique side-effect
String equality is semantically identical to a wildcard comparison but without any wildcard characters, so I don't see an issue here. If the example you provided,
Yes, but this sounds like a feature not a bug. Outstanding concerns
It's impossible to deprecate this syntactic sugar, without coming up with a new character for quotes. Not to be a broken record, but we can't quietly (aka no messages) change the meaning of existing syntax. Otherwise, users will have zero idea why queries that used to work no longer work. TL;DR |
I agree that this is an important feature @rw-access
but I cannot go over the inconsistency it adds to the language. The examples @costin provided show that a single use of the language could translate into two separate behaviors that cannot be reasoned about in a deterministic way. Is the user intention when writing |
This is a difficult decision, because unlike replacing backquote with single quote were we have a clear behaviour and an error message, if we make this change users could come up with wrong results. On the other hand, I'm not fan of such syntactic sugar because although it can ease the life of a user in the common case, it can have ambiguous semantics as for example in the case of Personally I'd always prefer clear semantics to user convenience. Imho. the Taking into consideration the fact that the change of this behaviour is "silent" and can lead to unexpected results, If we decide to make the change, and I'd vote towards it, let's make it now in the experimental stage, otherwise it will be impossible. |
tl;dr
I don't want to get side-tracked by the actual details and timeline (deprecation, compatibility flags, etc...). To your point on:
The above is basic math using an operator association and commutativity on literals. With fields it becomes even more confusing:
Which end up breaking transitivity:
This is a big deal. Essentially ProposalIntroduce a separate operator, say Pros:
Cons:
|
I surveyed the primary users (@elastic/security-intelligence-analytics)of EQL, and it was unanimous (12/12) that This was a deliberate design decision and is one that is expected and desired. Maybe the We can talk more about this offline, but I think |
A few more examples of
|
I find the semantics of |
To me, this is a hard decision. On one hand, I agree that On the other hand, if we do this change and then all our example rules in the docs use So I, for one, don't have a clear opinion yet. I'm trying to speak to more people to understand how offputting having to always use |
I'll echo @rw-access's comments, speaking for the security user. Elastic's Security Solution is better if we leave this pattern in place. Wildcards are extensively employed by security users. I'm unaware of users disapproving of the behavior in place today. They've appreciated the simplicity, readability, and ease of use in many EQL features, including our treatment of wildcards. Echo'ing @tsg, these usability factors with EQL and Detection Rules directly support one of the Solution's primary differentiators. We did it this way for a good reason. I'd expect but don't want to assume that the other Solutions would agree. For what it's worth, Splunk does it like we do today (@rw-access edit: for both wildcards and case-insensitivity by default. Also, Microsoft KQL has this behavior for wildcards, although their usage is limited to prefix queries). They allow users to search for asterisks via character escaping. This implies they came to a similar conclusion about what users expect and want. |
There are some bugs here:
All string comparisons for Python EQL (and other Elastic Endgame implementations) are case-insensitive. This Originally, range queries ( I don't see that as an issue, just a matter of defining behavior. Comparing strings lexicographically is a pretty rare, but does exist. I don't think we need to overextend for the one use case. |
First off, thanks for engaging folks! I recognize the security focus of EQL however the vision for Elasticsearch EQL is to grow beyond that and cater to any user of Elasticsearch. @mark-dufresne
I'm not too familiar with Splunk; my understanding is the context/command/pipe defines the semantics. The wildcard you linked to is applied only inside a
Unless I'm looking at the wrong language, the documentation indicates
The
Same for wildcards:
I've used
Another example of unexpected behavior that surprises even power users.
This is inaccurate. Moving forwardFolks, inside QL we've given quite a lot of thought to this topic, starting with #54411 in March. Yet despite our best efforts, so far we haven't found a solid solution. What's clear to me is EQL should out of the box provide case-insensitive equality check with wildcard support. Hence why I propose the following:
Pro:
Cons:
Let me know you think. |
Thanks everyone for the discussion. |
Currently EQL supports the following pattern for doing wildcard comparisons:
field == "wild*card"
which translates towildcard(field, "wild*card")
.That is making a comparison (
==
or!=
) to a string that contains*
translates to a wildcard check.This is convenient however it has a few side-effects:
*
.foo == concat("wild", "*", "card")
foo == "wild*" or foo == "*card"
vswildcard(foo, "wild*", "*card")
It would be good to get some numbers to see how often this pattern is being used.
My proposal is to deprecate/remove this syntactic sugar and promote explicit use of
wildcard
for clear semantics and clarity in intent.The text was updated successfully, but these errors were encountered: