-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NIP-45 for COUNT #144
Add NIP-45 for COUNT #144
Conversation
This is interesting, but I think it might be better to make a new message, separate from |
A Example: Some relays may also artificially limit all results sets to avoid abuse, counting rows can be very expensive so relays should also implement abuse prevention for this type |
I also think that it would be better to simply have a new command 'COUNT' that works exactly like 'REQ' but returns just the events.length: request:
response:
and what is the expected behaviour if you don't close the req? should the relay keep sending new counts (i.e. num++ ) or is this like a Notice single message? |
I think I prefer @eskema's version as well. I don't think streaming count is useful enough for the complexity, so I would make it just a single message. I'll revise the NIP when I have a chance. |
798bfa1
to
7970296
Compare
Updated the PR, let me know what you think. I'm slightly inclined to make this a little more open ended in case someone wants something other than a count, for example a unique list of pubkeys, or rounded time series data. But I suppose we can re-use the GROUP verb with a different request verb. |
not sure I like this approach, the groups seem unnecessary, why not send a new req for the counts you need instead of lumping it like that? or, just return a count for each filter if you want multiples in a req, but subgroups in the reqs seem complicated for no reason. why woold you lump a bunch of requests and ask separate counts? |
In order to get certain groups, you would have to know what filters to use, which would require retrieving all events in the first place. For example, notes by pubkey:
|
Isn't this trying to replace SQL with a JSON query language? I don't have a clear reason for this, but it doesn't seem right to treat relays as general-purpose databases. |
It does cover the GROUP BY case popularized by sql, yes. Just like limit = limit, since/until = order by, offset, kind/#e/#p/author = where. Grouping is useful. |
I think it should simply be:
i.e, you do the grouping yourself before asking the relay, or
and the response be |
@fiatjaf I see what you mean though, this opens the door to more and more complexity, while second-layer protocols or centralized services can perform this function successfully. I'm just not sure where to draw the line. Probably this side of optimizations. |
897cd31
to
dcc111a
Compare
Alright, since groups seem to be unpopular I've removed them, please take another look. |
Reactions are the one thing I see that would most profit from a "count" nip but in its current form it doesn't profit at all and as the example was not brought up in the discussion, please pardon for beating this dead horse: I think the most useful for light clients would be to stay close to the
While I think the above is preferable, it could of course also be pruned by some implicitly given information:
|
Also while I do feel pity for my server that has to process tons of data to return some simple "12432", we have to see what is the overall most efficient way of achieving certain results and if the server doing x3 the work so the client can do just 1/5 of the work then that is acceptable for some client devs and server operators. I assume that at scale, users will pay one way or another for resources used on the servers and I'm totally fine with a free tier that does not see likes beyond likes from their follows for example. The idea here is to standardize what's useful for some and I totally see the usefulness for some in the |
Ack, I've had this implemented in wss://relay.nostr.band more than a month ago, somehow discussion got lost in the telegram group, you can try |
Concept ACK |
How about adding that clients should check NIP-11 first to see if they support NIP-45 before issuing COUNT requests. |
We should probably merge this. |
Different relays have different events and counts, so how would you get to the final count? You can't just add them up. Right now you have to get different events from multiple relays, filter the duplicates and thats the count from the client perspective. |
That's a super good point, I hadn't thought of that. I do think COUNT could still be useful as an approximation (since it is anyway), you could take the max of multiple results, or select which relay to request a count from. |
Aaand I've changed my mind, I've decided that COUNT is not useful. If we add relay extensions we could put something together that would be more complete. |
Hi, I'm adding COUNT to my metadata / contact indexing nodes (wss://us.rbr.bio and wss://eu.rbr.bio) for follower counts, and it was easy to implement, but I have a few questions: Why {count: 30} when ["COUNT","subname",30] would already contain the information and is simpler? For group_by extensions it makes sense to have an array, but for simple counts I would prefer just a simple number (this is just a nitpick though). The more interesting question: I would just like to support a few types of COUNT and group_by queries. I think the best would be if I could specify the types of queries my relay supports in the relay information document (although that should be handled in another NIP). And the main question: how should the server reply if it doesn't handle the specific COUNT query? Should the relay just reply with a NOTICE, or EOSE, or do nothing? I guess NOTICE would be the most backwards compatible, but NOTICE doesn't contain the subscription name, as it just supports 1 message. |
One more extra question: the NIP contains just "" as the second parameter, but I guess it's a label (maybe not subscription) that helps identifying the reply. It should be clear in the documentation. |
This is exactly the place that COUNT makes sense. It's not good for direct use with multiple relays, but if you're using a relay multiplexer or indexer, COUNT can work. To answer @adamritter's questions:
I'm not sure about selectively supporting COUNT attributes/groups, can you share the reason for that? |
092d8f5
to
396e52d
Compare
I'm running 2 relay servers that hold all metadata and contacts of all relays in RAM and serves them (wss://us.rbr.bio and wss://eu.rbr.bio). I implemented it with hash maps in RAM, I don't use any query engine. Already added follower counts support, but it's also a specific hash map just for followers: https://iris.to/npub1dcl4zejwr8sg9h6jzl75fy4mj6g8gpdqkfczseca6lef0d5gvzxqvux5ey I'm also planning to implement group_by authors/pubkey just for getting the list of all followers, as it's important and supported by the data structures in my server. I was thinking of the selective supporting because most relay implementations have secondary indices that efficiently support count for certain types of queries, but it maybe not important to require those indices to be used for group_by operations (for example the reactions are not that many so group_by is easy for reactions on the server even if there is no index for it). To tell you the truth, I'm also not sure about this selective support (I'm trying to conform to the standard as much as possible), but what's more important is that even if we do it, it shouldn't be part of this NIP, so there's no need to take a decision. There may be millions of group by results though, so the group_by NIP should maybe specify a limit: and returning limited number of results (relays usually have a default max limit anyways). |
The current specification doesn't specify if multiple counts are supported or not in 1 query (I think they should be supported). Right now there's both {count: 3} and [{count:3}] as suggestions for simple counts, I prefer {count:3}, and using arrays for group by when we extend count to support group by (which is needed) |
I implemented basic group_by support for getting followers on wss://us.rbr.bio:
(returns top 1000 followers by popularity) |
I've re-introduced |
This still looks like it is not really solving any problems, specially the |
Can you elaborate? Currently coracle only uses COUNT in one place, but it allows me to avoid downloading many megabytes of data to populate a single number. A couple use cases for group_by are enumerated above. Your comment that it might be burdensome to relays is valid, but it's opt-in as written, and can be useful for analytics as well as more common uses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follower count will not work with this accurately. Outdated lists, sending to multiple relays, ...
Yes, however it will work if asking a single relay, a multiplexer (which can do the heavy lifting on its end, saving client bandwidth), or an indexer that scrapes a total from many relays. |
For example, now Coracle says I have 47k followers when in fact the truth is probably more like 5k. But yes, I think |
I can't imagine with 2 million profiles (yes, many fake), that you would only have 5k followers. |
0e7300a merged. I feel the |
This is the beginning of sort of a second-layer indexing scheme, I came across the need for it when implementing follower count. Currently I'm downloading ~4000 events in the case of jack, and throwing them away just to get the count.