Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-50: Adding regex extension #1480

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kehiy
Copy link
Contributor

@kehiy kehiy commented Sep 3, 2024

Regular expressions are commonly used for querying and searching, so it's a good reason to have it as an extension so clients can make specific and complex search queries.

@kehiy
Copy link
Contributor Author

kehiy commented Sep 3, 2024

Also, a question is do we have a way to determine a set of key:value's are extensions or part of the query in the search field?
Or do we have a convention like the search queries must be at the end or beginning of the search field? or can we have multiple extensions on a search field? or can we have an extension beside the normal text search query?

If yes, where are they defined? if no, should we talk about them here? maybe based on current implementations.

@vitorpamplona
Copy link
Collaborator

Do you intend to build a relay that can do regexes?

@kehiy
Copy link
Contributor Author

kehiy commented Sep 4, 2024

@vitorpamplona Currently I'm working on the Immortal implementation. It's at the early stages and kind of empty. But NIP-50 and especially regex pattern matching is a part of our plan to support to provide complex queries inside the text fields themselves besides the whole event queries.

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Sep 4, 2024

2 points:

Search engines are not usually regex-friendly. You might want to look at how tools like Lucene, Solr, and Elasticsearch works to get a firmer basic understanding of how a full-text search engine can be built. The key component is indexing and regex is just too complex to allow pre-indexing. You will be left with applying the regex in all events in every new query.

Regex is not a standard. Although the basic features work in all languages, there are plenty of grammar specializations going on. The basic features like x*, [abc], (capture groups), are widely supported. Character classes like \d, \w, \s are kinda supported, but, for example, \z means "strict end of string" in perl but it is unsupported in javascript. In C, \d is not supported in GNU regular expressions, and backslash character classes like \w, \s, etc. are not supported at all in pure POSIX regexes -- you must use [_[:alnum:]], [[:space:]], [[:digit:]], etc. For advanced things like negative lookaheads, it's the same story. It's supported in perl-compatible regexes (PCREs) and in javascript... but not in POSIX or in command-line tools like grep. You might want to define exactly which version you want to use or which features must be enabled.

@kehiy
Copy link
Contributor Author

kehiy commented Sep 4, 2024

@vitorpamplona Thanks a lot for your explanation! So let me do more research on different regex versions and make a final decision on the change. I'll inform you once it got updated.

@kehiy
Copy link
Contributor Author

kehiy commented Sep 4, 2024

@vitorpamplona I did some researches on how different regex flavors are supported. PCREs are mostly supported by programming languages, also my SQL and Postgres support them. In the case of Mongo, Redis, and Elastic they partially support regex and their supported flavor are super sets or simplified versions of PCREs as I understand.

The POSIX flavor is mostly used on grep and similar tools which I don't think we need to support in this specific case(?) since relays are mostly doing this pattern matching using database or code.

So I decided to use PCREs here.

Also, about DBs that are not regex friendly, I think for full and high-performance support the domain-specific databases for Nostr events can work on this case maybe? and consider that they MAY support this extension at all. for specific cases mostly chats I think regex sometimes is very helpful!

@kehiy
Copy link
Contributor Author

kehiy commented Sep 8, 2024

Any idea guys?

@fiatjaf
Copy link
Member

fiatjaf commented Sep 8, 2024

Databases generally use indexes in order to find things fast. To do a regex match you can't use indexes, so they would have to load every single stored record and check it against the regex, even if MySQL supports it that is probably what it does underneath, which for a big enough dataset would be prohibitively costly and slow.

If you want to support this, then sure, it's your choice, we can leave this PR open and merge it later if more people start using it too.

@kehiy
Copy link
Contributor Author

kehiy commented Sep 8, 2024

@fiatjaf OK, I'll keep it open and wait to see it in action and see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants