Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIP-01 filter enhancement proposal - negation of IDs - via new filter field with either explicit or bloom filter matching #1290

Open
mleku opened this issue Jun 7, 2024 · 8 comments

Comments

@mleku
Copy link

mleku commented Jun 7, 2024

this is probably a long term process to actually happen, but this is the reasoning behind why to do this:

i am running a local cache relay with this feature in nostrudel on my local machine

because there is no finality or consensus to nostr, even after an event sends an EOSE from the relay local cache, there is the possibility that it's got gaps in it and thus for this reason clients will also just ask for the same filter upstream anyhow, and get it sent, which is a lot of extra bandwidth

but if i could query one source first, and then append a set of negations of event IDs of all the events in the local cache, i could then avoid having the remote relays send them to me again

it could possibly also be compressed using a bloom filter with a reasonable false positive rate so it's just one extra field in the filter that means any match that matches on the bloom filter is not sent

this would be a boon for mobile users and users on slow or high latency connections to minimise bandwidth usage

it is possible that also event store implementations could be written that avoid decoding matching events using this filter as well, which is a reduction of workload on the relays at the same time

@mleku
Copy link
Author

mleku commented Jun 9, 2024

just wanted to add, thinking about this i realised that relays can avoid re-sending events within a short time window by keeping a track of events they already sent to specific clients or IP addresses so they don't resend them in a reasonable time window (5-30 minutes)

this would be even easier to make work if the clients were authed to a pubkey so they can definitively filter by IP but i think that even 32 bits of IP address at a given time window is probably fine grained enough with a half hour expiry

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Jun 9, 2024

Many clients don't have local cache. So, the user goes to page 1, then page 2 and back to page 1 in a few seconds might require the relay to resend all events to rebuild page 1.

I'd say relays shouldn't try to do this. If the client is asking, just send the damn thing.

@mleku
Copy link
Author

mleku commented Jun 11, 2024

Many clients don't have local cache. So, the user goes to page 1, then page 2 and back to page 1 in a few seconds might require the relay to resend all events to rebuild page 1.

I'd say relays shouldn't try to do this. If the client is asking, just send the damn thing.

  1. such clients won't send such things
  2. clients that do cache data or use an local cache relay would save people a lot of bandwidth and probably result in a substantial improvement of fetching new updates that are partially populated, disk space for a users events is a lot cheaper than bandwidth in some places, many users i read complaining about clients chewing battery and bandwidth due to the lack of such a feature as avoiding unnecessary retransmits
  3. what about the fact that negentropy and other relay propagation systems take sometimes minutes or even hours to send messages around, too bad, just use the big relays right? UX doesn't matter, right?
  4. my relay logs everything so i can catch it doing stupid things, or clients doing stupid things, and even with the internal cache turned on, nostrudel fetches the same events like 50 times in a span of 10 minutes - only because it's impossible for it to say "i already have these ones so don't send them again"
  5. do you also own a PC with 256Gb memory, 16Tb 4 way raid NVMe and a gigabit ethernet internet connection?

@vitorpamplona
Copy link
Collaborator

vitorpamplona commented Jun 11, 2024

such clients won't send such things

That's the client's problem, not the relay.

clients that do cache data or use an local cache relay would save people a lot of bandwidth and probably result in a substantial improvement of fetching new updates that are partially populated, disk space for a users events is a lot cheaper than bandwidth in some places

yep. Like you said, there is a solution for this already. It's called cache. There is no need to change relays. Clients just implement cache. It's not a requirement. But people that are concerned about network use should simply avoid those clients. It's not that hard. It's early. People will implement these caching systems in time.

many users i read complaining about clients chewing battery and bandwidth due to the lack of such a feature as avoiding unnecessary retransmits

Then just fix the client. If the client is asking for events it already has, then it either has a reason to do it OR it's a bug and can be fixed.

what about the fact that negentropy and other relay propagation systems take sometimes minutes or even hours to send messages around, too bad, just use the big relays right? UX doesn't matter, right?

I don't understand what big relays and negentropy have to do with this. If you want a faster and simpler SYNC procedure there are options, like #826 . But I assume the issue here is on the REQ requests and not sync (they are not the same). So, I don't really understand why this line is even here.

my relay logs everything so i can catch it doing stupid things, or clients doing stupid things, and even with the internal cache turned on, nostrudel fetches the same events like 50 times in a span of 10 minutes

Then fix noStrudel. Don't break the relay behavior for other clients because of one bug in one implementation.

Also, there might be a reason to get everything again. Amethyst needs to re-download things to keep the memory use small and avoid Android triggering garbage collection (which hangs the app while it performs). Disk usage on phones is quite slow, especially when the user has a non-flagship device. So, we have to balance the disk, the memory and the data usage. If the disk is busy and the memory is full, Amethyst will drop events and then re-request them later if the user opens a screen that needs it. A lot is going on to manage those things without hanging the app.

do you also own a PC with 256Gb memory, 16Tb 4 way raid NVMe and a gigabit ethernet internet connection?

What does a desktop PC have to do with this? And no, I don't have that.

@mikedilger
Copy link
Contributor

mikedilger commented Jun 11, 2024

There are reasons to want negative filters

If we had negative filters, I could use a local relay instead of a local custom database with a custom interface. I would love to exclude events that have been "dismissed" by my user (by id), exclude replys with the "annotate" tag (by tag), exclude events that are authored by me (by pubkey) and other cases too. And if that was a nostr relay REQ/FILTER interface the code could be a lot simpler and a lot more shared with other code.

You might think "Just ask for more than you want and filter them after they arrive." But if I filter them on my side, then I end up with less than the LIMIT I was seeking. To get exactly 35 events, how many should the relay return when it has no idea which ones will count and which ones are not acceptable. This is a hard problem without negative filters.

I'm just stating that I recognize the desire for such a thing. I have not thought too much about whether nostr relays should support negative filters. But if we did it, we should do it for more than just ids.

@mleku
Copy link
Author

mleku commented Jun 11, 2024

well, IDs are a start, anyhow, and they can be compactly represented as low false positive bloom filters, additionally

if you'd seen logs of how clients pull events, usually in overlapping windows, it's really obvious how many times the same events keep getting sent again and again and just ignored

@frbitten
Copy link
Contributor

I think it's an interesting feature from the point of view that it's optional and if the client doesn't inform anything, the relay will send everything as it is currently.

It would include negative filters for the following items:

  • IDs
  • tags
  • authors (I don't want to receive messages from people I've blocked)
  • kind

Another alternative to the filter is an ignore event where I indicate what I want to ignore and authenticated relays, which know who the client's user is, can check what was ignored and not send the data.

@mleku
Copy link
Author

mleku commented Oct 16, 2024

yeah, i'm not optimistic about this - imo - essential feature of a filter query, and it is not difficult to implement a feature where a relay, if it has the authed user's mute list, can prune out the events from that author

tags don't really need negation neither do kinds, they are implicitly whitelist queries

i've watched what clients do and what the relays send back and a frequent thing is the same events over and over again, and events from users that are muted by the client

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants