Add NIP-45 for COUNT #144

staab · 2023-01-04T04:13:36Z

This is the beginning of sort of a second-layer indexing scheme, I came across the need for it when implementing follower count. Currently I'm downloading ~4000 events in the case of jack, and throwing them away just to get the count.

fiatjaf · 2023-01-04T10:51:16Z

This is interesting, but I think it might be better to make a new message, separate from REQ, to just ask for counts. It could take the same filter format and just return the count of events instead of the events themselves.

v0l · 2023-01-04T11:05:29Z

A REQC command could be trivial to implement for relays using the existing REQ filter format, a response EVENTC message can return the count for each filter:

Example:
["REQC", { filter1 }, { filter2 }]
["EVENTC", n_filter1, n_filter2, ...]

Some relays may also artificially limit all results sets to avoid abuse, counting rows can be very expensive so relays should also implement abuse prevention for this type

eskema · 2023-01-04T15:44:50Z

I also think that it would be better to simply have a new command 'COUNT' that works exactly like 'REQ' but returns just the events.length:

request:

['COUNT', 'sub_id', {filter}]

response:

['COUNT', 'sub_id', 'num']

and what is the expected behaviour if you don't close the req? should the relay keep sending new counts (i.e. num++ ) or is this like a Notice single message?

staab · 2023-01-04T16:25:51Z

I think I prefer @eskema's version as well. I don't think streaming count is useful enough for the complexity, so I would make it just a single message. I'll revise the NIP when I have a chance.

staab · 2023-01-06T01:40:15Z

Updated the PR, let me know what you think. I'm slightly inclined to make this a little more open ended in case someone wants something other than a count, for example a unique list of pubkeys, or rounded time series data. But I suppose we can re-use the GROUP verb with a different request verb.

eskema · 2023-01-06T01:54:29Z

not sure I like this approach, the groups seem unnecessary, why not send a new req for the counts you need instead of lumping it like that? or, just return a count for each filter if you want multiples in a req, but subgroups in the reqs seem complicated for no reason. why woold you lump a bunch of requests and ask separate counts?

staab · 2023-01-06T01:58:23Z

In order to get certain groups, you would have to know what filters to use, which would require retrieving all events in the first place. For example, notes by pubkey:

["COUNT", "", {filters: {kinds: [1], since: <timestamp>}, group_by: ['pubkey']}]
["GROUP", "", {group: [<pubkey1>], count: 238}]
["GROUP", "", {group: [<pubkey1>], count: 21}]

fiatjaf · 2023-01-06T01:58:41Z

Isn't this trying to replace SQL with a JSON query language?

I don't have a clear reason for this, but it doesn't seem right to treat relays as general-purpose databases.

staab · 2023-01-06T01:59:36Z

It does cover the GROUP BY case popularized by sql, yes. Just like limit = limit, since/until = order by, offset, kind/#e/#p/author = where. Grouping is useful.

eskema · 2023-01-06T02:04:42Z

I think it should simply be:

["COUNT", "", {kinds: [1], since: <timestamp>, authors: ['pubkey1']}]
["COUNT", "", {kinds: [1], since: <timestamp>, authors: ['pubkey2']}]

i.e, you do the grouping yourself before asking the relay, or

["COUNT", "", {kinds: [1], since: <timestamp>, authors: ['pubkey1']},  {kinds: [1], since: <timestamp>, authors: ['pubkey2']}]

and the response be count1, count2

staab · 2023-01-06T02:09:11Z

@fiatjaf I see what you mean though, this opens the door to more and more complexity, while second-layer protocols or centralized services can perform this function successfully. I'm just not sure where to draw the line. Probably this side of optimizations.

staab · 2023-01-11T16:13:30Z

Alright, since groups seem to be unpopular I've removed them, please take another look.

Giszmo · 2023-01-11T18:34:31Z

Reactions are the one thing I see that would most profit from a "count" nip but in its current form it doesn't profit at all and as the example was not brought up in the discussion, please pardon for beating this dead horse:

I think the most useful for light clients would be to stay close to the REQ syntax but optionally allow to group_by. To still allow matching queries to results, the easiest approach is to have an array per query, with those using group_by returning the fields grouped by, too.

['COUNT', '',
    {kinds: [7], '#e': [<eid1>, <eid2>], group_by: ['#e', 'content']},
    {kinds: [1717171], '#e': [<eid1>, <eid2>], group_by: ['#e', 'content']},
    {kinds: [3]}
    ]
['COUNT', '', 
    [{'#e': <eid1>, content: '', count: 12},
        {'#e': <eid1>, content: '+', count: 5},
        {'#e': <eid1>, content: '+ ', count: 1},
        {'#e': <eid1>, content: 'banana', count: 1},
        {'#e': <eid1>, content: '🤢', count: 3},
        {'#e': <eid1>, content: '🍆', count: 2},
        {'#e': <eid2>, content: '', count: 1}
    ],
    [],
    [{count: 5745}]]

While I think the above is preferable, it could of course also be pruned by some implicitly given information:

['COUNT', '',
    {kinds: [7], '#e': [<eid1>, <eid2>], group_by: ['#e', 'content']},
    {kinds: [1717171], '#e': [<eid1>, <eid2>], group_by: ['#e', 'content']},
    {kinds: [3]}
    ]
['COUNT', '', 
    [[<eid1>,
        ['', 12],
        ['+', 5],
        ['+ ', 1],
        ['banana', 1],
        ['🤢', 3],
        ['🍆', 2]],
    [[<eid2>,
        ['', 1]],
    [],
    5745]

Giszmo · 2023-01-11T19:02:05Z

Also while I do feel pity for my server that has to process tons of data to return some simple "12432", we have to see what is the overall most efficient way of achieving certain results and if the server doing x3 the work so the client can do just 1/5 of the work then that is acceptable for some client devs and server operators.

I assume that at scale, users will pay one way or another for resources used on the servers and I'm totally fine with a free tier that does not see likes beyond likes from their follows for example. The idea here is to standardize what's useful for some and I totally see the usefulness for some in the group_by.

brugeman · 2023-01-24T17:34:38Z

Ack, I've had this implemented in wss://relay.nostr.band more than a month ago, somehow discussion got lost in the telegram group, you can try
["COUNT","1",{"#p":["84dee6e676e5bb67b4ad4e042cf70cbd8681155db535942fcc6a0533858a7240"], "kinds":[3]}]

pseudozach · 2023-01-31T21:02:31Z

Concept ACK
I'd like to see this functionality, will test relays with this capability and implement it in my client.

mikedilger · 2023-01-31T22:48:16Z

How about adding that clients should check NIP-11 first to see if they support NIP-45 before issuing COUNT requests.

fiatjaf · 2023-02-01T00:35:35Z

We should probably merge this.

fabianfabian · 2023-02-02T06:59:53Z

Different relays have different events and counts, so how would you get to the final count? You can't just add them up. Right now you have to get different events from multiple relays, filter the duplicates and thats the count from the client perspective.

staab · 2023-02-02T22:44:46Z

That's a super good point, I hadn't thought of that. I do think COUNT could still be useful as an approximation (since it is anyway), you could take the max of multiple results, or select which relay to request a count from.

staab · 2023-02-15T17:49:05Z

Aaand I've changed my mind, I've decided that COUNT is not useful. If we add relay extensions we could put something together that would be more complete.

adamritter · 2023-03-12T20:36:59Z

Hi,

I'm adding COUNT to my metadata / contact indexing nodes (wss://us.rbr.bio and wss://eu.rbr.bio) for follower counts, and it was easy to implement, but I have a few questions:

Why {count: 30} when ["COUNT","subname",30] would already contain the information and is simpler? For group_by extensions it makes sense to have an array, but for simple counts I would prefer just a simple number (this is just a nitpick though).

The more interesting question: I would just like to support a few types of COUNT and group_by queries. I think the best would be if I could specify the types of queries my relay supports in the relay information document (although that should be handled in another NIP).

And the main question: how should the server reply if it doesn't handle the specific COUNT query? Should the relay just reply with a NOTICE, or EOSE, or do nothing? I guess NOTICE would be the most backwards compatible, but NOTICE doesn't contain the subscription name, as it just supports 1 message.

adamritter · 2023-03-12T20:40:08Z

One more extra question: the NIP contains just "" as the second parameter, but I guess it's a label (maybe not subscription) that helps identifying the reply. It should be clear in the documentation.

staab · 2023-03-13T17:35:42Z

I'm adding COUNT to my metadata / contact indexing nodes

This is exactly the place that COUNT makes sense. It's not good for direct use with multiple relays, but if you're using a relay multiplexer or indexer, COUNT can work.

To answer @adamritter's questions:

For simple counts I'm not using a number so that additions to the NIP can be made later without breaking backwards compatibility. One rule of thumb I've learned is always return an associative data type from APIs that aren't easily refactored.
I would guess NOTICE if not supported, or just ignore the request, but I don't really know.
The "" parameter is the subscription id, standard with REQ etc. Probably a good idea to clarify though

I'm not sure about selectively supporting COUNT attributes/groups, can you share the reason for that?

adamritter · 2023-03-13T20:35:00Z

I'm not sure about selectively supporting COUNT attributes/groups, can you share the reason for that?

I'm running 2 relay servers that hold all metadata and contacts of all relays in RAM and serves them (wss://us.rbr.bio and wss://eu.rbr.bio).

I implemented it with hash maps in RAM, I don't use any query engine.

Already added follower counts support, but it's also a specific hash map just for followers:

https://iris.to/npub1dcl4zejwr8sg9h6jzl75fy4mj6g8gpdqkfczseca6lef0d5gvzxqvux5ey

I'm also planning to implement group_by authors/pubkey just for getting the list of all followers, as it's important and supported by the data structures in my server.

I was thinking of the selective supporting because most relay implementations have secondary indices that efficiently support count for certain types of queries, but it maybe not important to require those indices to be used for group_by operations (for example the reactions are not that many so group_by is easy for reactions on the server even if there is no index for it).

To tell you the truth, I'm also not sure about this selective support (I'm trying to conform to the standard as much as possible), but what's more important is that even if we do it, it shouldn't be part of this NIP, so there's no need to take a decision.

There may be millions of group by results though, so the group_by NIP should maybe specify a limit: and returning limited number of results (relays usually have a default max limit anyways).

adamritter · 2023-03-13T22:01:15Z

The current specification doesn't specify if multiple counts are supported or not in 1 query (I think they should be supported).

Right now there's both {count: 3} and [{count:3}] as suggestions for simple counts, I prefer {count:3}, and using arrays for group by when we extend count to support group by (which is needed)

adamritter · 2023-03-14T01:15:25Z

I implemented basic group_by support for getting followers on wss://us.rbr.bio:

["COUNT","hello",
      {"#p":["85080d3bad70ccdcd7f74c29a44f55bb85cbcd3dd0cbb957da1d215bdb931204"],"kinds":[3]},
      {"#p":["85080d3bad70ccdcd7f74c29a44f55bb85cbcd3dd0cbb957da1d215bdb931204"],"kinds":[3],"group_by":["pubkey"]}]

["COUNT","hello",{"count":50538}, 
  [{"pubkey":"82341f882b6eabcd2ba7f1ef90aad961cf074af15b9ef44a09f9d2a8fbfbe6a2","count":1},...

(returns top 1000 followers by popularity)

staab · 2023-03-31T14:44:55Z

I've re-introduced group_by to fit @adamritter's implementation. If relays prefer not to implement it, they can raise a NOTICE or ignore the group_by key. I think we should go ahead and merge this, as there are multiple implementations in the wild, and despite the problems with merging results from a relay, COUNT can be done more reliably by either indexers or multiplexers.

fiatjaf · 2023-03-31T15:24:46Z

This still looks like it is not really solving any problems, specially the group_by clause, it makes no sense to me.

staab · 2023-03-31T15:35:36Z

Can you elaborate? Currently coracle only uses COUNT in one place, but it allows me to avoid downloading many megabytes of data to populate a single number. A couple use cases for group_by are enumerated above. Your comment that it might be burdensome to relays is valid, but it's opt-in as written, and can be useful for analytics as well as more common uses.

Semisol

Follower count will not work with this accurately. Outdated lists, sending to multiple relays, ...

staab · 2023-03-31T23:09:09Z

Follower count will not work with this accurately. Outdated lists, sending to multiple relays, ...

Yes, however it will work if asking a single relay, a multiplexer (which can do the heavy lifting on its end, saving client bandwidth), or an indexer that scrapes a total from many relays.

fiatjaf · 2023-04-01T11:27:01Z

For example, now Coracle says I have 47k followers when in fact the truth is probably more like 5k.

But yes, I think COUNT is fine, even though it is not as useful as it would be on a centralized network, my problem is really with the group_by.

staab · 2023-04-01T12:31:28Z

I can't imagine with 2 million profiles (yes, many fake), that you would only have 5k followers.

fiatjaf · 2023-04-05T21:43:16Z

0e7300a merged. I feel the group_by is too messy. I may be wrong, but I would like to be proven wrong first. Then we can edit this.

jblachly mentioned this pull request Jan 5, 2023

Pulling follower count/event count is super inefficient and bandwidth intensive [NIP-45][Bounty 300,000 sats] damus-io/damus#85

Closed

staab force-pushed the nip-45 branch 2 times, most recently from 798bfa1 to 7970296 Compare January 6, 2023 01:40

staab force-pushed the nip-45 branch 2 times, most recently from 897cd31 to dcc111a Compare January 11, 2023 16:12

staab changed the title ~~Add NIP-45, which defines a attribute on REQs~~ Add NIP-45 for COUNT Jan 26, 2023

staab mentioned this pull request Feb 15, 2023

Update NIP 11 to support relay recommendations #259

Closed

staab mentioned this pull request Mar 8, 2023

NIP-02 (Kind 3) doesn't match what it's currently used for (following/followers lists) and doesn't scale for that purpose #335

Closed

staab force-pushed the nip-45 branch 2 times, most recently from 092d8f5 to 396e52d Compare March 13, 2023 17:36

Jonathan Staab added 2 commits March 31, 2023 09:35

Add NIP-45, which defines a COUNT verb

0e7300a

Add group_by back in to NIP-45

5714e79

staab force-pushed the nip-45 branch from 396e52d to 5714e79 Compare March 31, 2023 14:43

Semisol reviewed Mar 31, 2023

View reviewed changes

fiatjaf closed this Apr 5, 2023

Add NIP-45 for COUNT #144

Add NIP-45 for COUNT #144

Conversation

staab commented Jan 4, 2023 • edited Loading

fiatjaf commented Jan 4, 2023

v0l commented Jan 4, 2023

eskema commented Jan 4, 2023 • edited Loading

staab commented Jan 4, 2023

staab commented Jan 6, 2023 • edited Loading

eskema commented Jan 6, 2023

staab commented Jan 6, 2023 • edited Loading

fiatjaf commented Jan 6, 2023 • edited Loading

staab commented Jan 6, 2023

eskema commented Jan 6, 2023 • edited Loading

staab commented Jan 6, 2023

staab commented Jan 11, 2023

Giszmo commented Jan 11, 2023 • edited Loading

Giszmo commented Jan 11, 2023

brugeman commented Jan 24, 2023

pseudozach commented Jan 31, 2023

mikedilger commented Jan 31, 2023

fiatjaf commented Feb 1, 2023

fabianfabian commented Feb 2, 2023

staab commented Feb 2, 2023

staab commented Feb 15, 2023

adamritter commented Mar 12, 2023

adamritter commented Mar 12, 2023

staab commented Mar 13, 2023

adamritter commented Mar 13, 2023

adamritter commented Mar 13, 2023

adamritter commented Mar 14, 2023 • edited by fiatjaf Loading

staab commented Mar 31, 2023

fiatjaf commented Mar 31, 2023

staab commented Mar 31, 2023

Semisol left a comment

Choose a reason for hiding this comment

staab commented Mar 31, 2023

fiatjaf commented Apr 1, 2023 • edited Loading

staab commented Apr 1, 2023

fiatjaf commented Apr 5, 2023

staab commented Jan 4, 2023 •

edited

Loading

eskema commented Jan 4, 2023 •

edited

Loading

staab commented Jan 6, 2023 •

edited

Loading

staab commented Jan 6, 2023 •

edited

Loading

fiatjaf commented Jan 6, 2023 •

edited

Loading

eskema commented Jan 6, 2023 •

edited

Loading

Giszmo commented Jan 11, 2023 •

edited

Loading

adamritter commented Mar 14, 2023 •

edited by fiatjaf

Loading

fiatjaf commented Apr 1, 2023 •

edited

Loading