You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have built a pipeline that, from various information sources, allows us to produce an entity whose data is composed on the fly.
Information for Stream PPSTREAM created 2024-08-17 18:27:32
Subjects: pipeline.product.>
Replicas: 1
Storage: File
Options:
Retention: Limits
Acknowledgments: true
Discard Policy: Old
Duplicate Window: 2m0s
Allows Msg Delete: true
Allows Purge: true
Allows Rollups: true
Limits:
Maximum Messages: unlimited
Maximum Per Subject: 1
Maximum Bytes: 200 GiB
Maximum Age: unlimited
Maximum Message Size: unlimited
Maximum Consumers: unlimited
State:
Messages: 3,359,000
Bytes: 14 GiB
First Sequence: 114,332 @ 2024-08-18 13:51:10
Last Sequence: 3,602,895 @ 2024-08-18 19:29:37
Deleted Messages: 129,564
Active Consumers: 2
Number of Subjects: 3,359,000
The messages are stored in a stream with Per Subject Messages Limit 1 and incoming messages have this subject template product.{sku}.{source}
This allows us to maintain in NATS the latest information from a source for a given product.
The pipeline subscribes to product.> to trigger the composition of a product sheet, which it stores in a KV bucket, a concept similar to an event store. During this phase, the pipeline must collect all the messages concerning a SKU product.{sku}.>
To do this, an ephemeral consumer is created to collect these messages and then deleted :
letmsg: JsMsg|null=nullwhile((msg=awaitcon.next())!==null){constsource=msg.headers?.get('source')messages.push({source: source??'unknown',data: unpack(Bun.gunzipSync(msg.data))})constinfo=awaitcon.info()if(info.num_pending===0){break}}con.delete()// unawaited to save RTT
This is called each time for each message in the stream.
We confirmed it's the next() call whitch get slower
We tried consume / async iterator whitch have worse performance here
With a base of 50,000 messages, this phase takes an average of 7ms, which is acceptable at this stage.
However, during testing 3,500,000 messages this phase takes an average of 420ms, which represents a significant performance hit.
It is important to note that no publication occurs in parallel. The messages were loaded in advance before the test was triggered.
Each call to consumer.info() does an API request to fetch the latest info. This has quite a large overhead when done for every message and can be simplified by getting the pending count directly from the message metadata:
I made a POC with rudimentary code replacing KV and JS with mongodb collections but with the same code base and some indexes.
I know the driver is optimized and enqueue changes but I added write concern to 1, so I have an ack from the MongoDB instance, effectively throttling to prevent the server from overwhelming.
I got these results :
Tell me if I can help in any way.
wallyqs
changed the title
Huge performance hit
Huge performance hit [v2.10.18]
Sep 4, 2024
wallyqs
changed the title
Huge performance hit [v2.10.18]
Huge performance hit on stream with 3M unique subjects [v2.10.18]
Sep 17, 2024
Observed behavior
We have built a pipeline that, from various information sources, allows us to produce an entity whose data is composed on the fly.
The messages are stored in a stream with
Per Subject Messages Limit 1
and incoming messages have this subject templateproduct.{sku}.{source}
This allows us to maintain in NATS the latest information from a source for a given product.
The pipeline subscribes to
product.>
to trigger the composition of a product sheet, which it stores in a KV bucket, a concept similar to an event store. During this phase, the pipeline must collect all the messages concerning a SKUproduct.{sku}.>
To do this, an ephemeral consumer is created to collect these messages and then deleted :
This is called each time for each message in the stream.
With a base of 50,000 messages, this phase takes an average of 7ms, which is acceptable at this stage.
However, during testing 3,500,000 messages this phase takes an average of 420ms, which represents a significant performance hit.
It is important to note that no publication occurs in parallel. The messages were loaded in advance before the test was triggered.
Disk I/O is pretty low
Expected behavior
We except a smaller performance hit.
Server and client version
The server is standalone 2.10.18
Javascript client 2.28.2
Host environment
Docker version 26.0.2
Volume driver : local
Steps to reproduce
No response
The text was updated successfully, but these errors were encountered: