Add REST call and event batch size metrics #131

matthew1001 · 2023-04-18T14:04:16Z

This PR adds a variety of metrics and will close #89 when its complete.

The approach I've used is to add to the existing log interceptor and use that as a point to capture inbound API metrics. The renamed logging-and-metrics interceptor exposes methods to set/inc/observe other metrics e.g. relating to blockchain calls and event processing.

A few more details of the approach I've used are in the issue.

src/eventstream-proxy/eventstream-proxy.base.ts

peterbroadhurst · 2023-04-21T12:52:57Z

src/eventstream-proxy/eventstream-proxy.base.ts

@@ -126,6 +129,21 @@ export abstract class EventStreamProxyBase extends WebSocketEventsBase {
  }

  private async processEvents(batch: EventBatch) {


I was reading the code of this function, and it was unclear to me how the batch optimization is able to be used by FireFly. It appears like we might be expecting individual acks from each event in the batch. Which I think in FireFly Core would mean an expensive DB commit for each event.

The current architecture is exploiting parallelism on the websocket, by dispatching these in parallel. So it might be that in the Core engine, we process them in parallel and pass them to an aggregator thread that does its own batching. That would be an alternative solution to efficient processing. However, that seems significantly more complex than simply propagating the batch as a single contained set that is pre-optimized for processing by Core.

I understand the focus of this PR is metrics, so this is not a blocker to this PR being closed, but if one of the goals is to use metrics to analyze the efficiency of the interface between tokens and FireFly Core, then I think there's a related task to do some code analysis and ensure:

All enrichment actions on a blockchain connector batch are executed in parallel promises within the Token connector

The interface to Core is such, that all enriched events are processed in a single DB transaction and ack'd with a single line turnaround.

Agreed. The metrics in the current PR helped to identify that there is potentially an issue in the way TC enriches events. As you say, it's probably for a separate PR to address any improvements in that regard but if we think any other metrics would be useful we can add them to this one.

Are we not expecting a single ACK from FF core per batch, rather than one per event?

Today we have:

const message: WebSocketMessageWithId = { id: uuidv4(), event: 'batch', data: <WebSocketMessageBatchData>{ events: messages, }, batchNumber: batch.batchNumber, }; this.awaitingAck.push(message);

If I'm understanding correctly, message is a payload containing a batch of events (under the data.events structure).

Then we have the following for handling ACKs:

handleAck(@MessageBody() data: AckMessageData) { if (data.id === undefined) { this.logger.error('Received malformed ack'); return; } const inflight = this.awaitingAck.find(msg => msg.id === data.id);

where inflight has a batchNumber so the handled ACK is presumably for a batch, not an individual event?

What I agree with is that I don't think FF core is treating that batch as a single DB commit. It appears to be doing (at least) one commit per event in the batch.

Correct, the connector currently propagates exactly the batching from the underlying blockchain connector. It does not batch or unbatch anything itself.

So whatever events are received in a batch from evmconnect, those events will be parsed and converted into a new batch of one or more events to be passed back to FireFly. There's an easy optimization to be had here, where we could build an array of promises and wait on them all with Promise.all() rather than awaiting each one in sequence:

firefly-tokens-erc20-erc721/src/eventstream-proxy/eventstream-proxy.base.ts

Line 130 in 1cb1d6c

for (const event of batch.events) {

If there's a request for the token connector to do any intelligent batching of its own (on top of what is done by the blockchain connector), that would definitely be a larger change.

The handling in FireFly core does result in a separate database transaction for each message in the batch. This is because 1) the token plugin has knowledge of the "fftokens" interface and how different types are spelled, but does not have knowledge of databases, and 2) the events manager has knowledge of databases, but not of the internals of the "fftokens" interface.
https://github.com/hyperledger/firefly/blob/f892be6f91f3ed5484f4d4cf9b1b49cd6c23d057/internal/tokens/fftokens/fftokens.go#L530
Reconciling this to provide for all events to be parsed in the context of a single database transaction will require some more thought about the roles of these two components.

Yeah I built a noddy version of your suggestion in https://github.com/kaleido-io/firefly-tokens-erc20-erc721/tree/async-enrichment which showed some very noticeable improvements in rate of batch delivery to FF core. Running some tests with that branch at least moved me on to trying to understand where other event-delivery bottlenecks in the FF stack are.

So to @peterbroadhurst 's suggestions:

All enrichment actions on a blockchain connector batch are executed in parallel promises within the Token connector

Not true today, but should be an easy enhancement.

The interface to Core is such, that all enriched events are processed in a single DB transaction

Not true today, and may be a significant change in FireFly core.

and ack'd with a single line turnaround

This is true today.

https://github.com/hyperledger/firefly/blob/f892be6f91f3ed5484f4d4cf9b1b49cd6c23d057/internal/tokens/fftokens/fftokens.go#L558

Reconciling this to provide for all events to be parsed in the context of a single database transaction will require some more thought about the roles of these two components.

Yes agreed.

src/app.module.ts

awrichar · 2023-05-22T19:04:25Z

src/logging-and-metrics.interceptor.ts

+
+@Module({})
+@Injectable()
+export class LoggingAndMetricsInterceptor implements NestInterceptor {


It feels like this could still be a separate interceptor from RequestLoggingInterceptor. That one is shared across most of our Nest projects, whereas this one feels quite specific to FireFly token connectors.

If there's a benefit to combining these two interceptors, I'm not fundamentally opposed; just seemed like two different tasks.

Personally it felt to me like they were both useful/common things to want to do as part of request interception, so bundled them together for that reason. I'm happy to split them apart if you think that would be best for the codebase, happy for you to take a call on it.

awrichar · 2023-05-22T19:06:28Z

src/main.ts

@@ -51,11 +50,10 @@ export function getApiConfig() {

 async function bootstrap() {
  const app = await NestFactory.create(AppModule);
-  app.setGlobalPrefix('api/v1');
+  app.setGlobalPrefix('api/v1', { exclude: ['/metrics'] });


Why is "metrics" at the root and not under /api?

I put them under metrics as that's the same path that things like FireFly core expose metrics on, so it seemed like the right path to use for TC metrics as well.

awrichar · 2023-05-22T19:10:50Z

src/tokens/tokens.module.ts

-  providers: [TokensService, AbiMapperService, BlockchainConnectorService],
+  providers: [
+    ...MetricProviders,
+    LoggingAndMetricsInterceptor,


Is it necessary to add these providers to individual modules if they've been registered globally?

I found that without adding the providers to the tokens module I hit

[Nest] 16531 - 06/20/2023, 10:00:41 AM ERROR [ExceptionHandler] Nest can't resolve dependencies of the BlockchainConnectorService (HttpService, ?). Please make sure that the argument LoggingAndMetricsInterceptor at index [1] is available in the TokensModule context. Potential solutions: - If LoggingAndMetricsInterceptor is a provider, is it part of the current TokensModule? - If LoggingAndMetricsInterceptor is exported from a separate @Module, is that module imported within TokensModule? @Module({ imports: [ /* the Module containing LoggingAndMetricsInterceptor */ ] })

I'm not sure if there's an alternative way which would mean individual modules don't need to add them?

Signed-off-by: Matthew Whitehead <[email protected]>

Co-authored-by: Peter Broadhurst <[email protected]> Signed-off-by: Matt Whitehead <[email protected]>

Signed-off-by: Matthew Whitehead <[email protected]>

nguyer · 2024-02-15T18:26:33Z

@awrichar This has been open for a long time with lots of questions and lots of answers. What's the verdict? Are more changes requested, or are these changes approved and should be merged?

EnriqueL8 · 2024-04-24T10:34:52Z

@awrichar Could you have another pass at this PR and assess if any more changes needed?

matthew1001 mentioned this pull request Apr 18, 2023

Prometheus Instrumentation #89

Open

matthew1001 force-pushed the prom-metrics branch 2 times, most recently from 06a9a52 to f89c80f Compare April 21, 2023 08:53

matthew1001 marked this pull request as ready for review April 21, 2023 08:56

peterbroadhurst reviewed Apr 21, 2023

View reviewed changes

matthew1001 requested review from awrichar and peterbroadhurst May 9, 2023 14:04

awrichar reviewed May 22, 2023

View reviewed changes

src/app.module.ts Show resolved Hide resolved

awrichar reviewed May 22, 2023

View reviewed changes

matthew1001 and others added 6 commits June 19, 2023 08:49

Add REST call and event batch size metrics

67a5032

Signed-off-by: Matthew Whitehead <[email protected]>

Update src/eventstream-proxy/eventstream-proxy.base.ts

5e24cb7

Co-authored-by: Peter Broadhurst <[email protected]> Signed-off-by: Matt Whitehead <[email protected]>

Add metric to record rate of batch ACKs from client

e58489d

Signed-off-by: Matthew Whitehead <[email protected]>

Fix batch number prop - 'number' -> 'batchNumber'

212748a

Signed-off-by: Matthew Whitehead <[email protected]>

Move where batch & ack timers are set

6d12b63

Signed-off-by: Matthew Whitehead <[email protected]>

Organise imports with npm lint

f85b2d2

Signed-off-by: Matthew Whitehead <[email protected]>

matthew1001 force-pushed the prom-metrics branch from 904b4f9 to f85b2d2 Compare June 19, 2023 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add REST call and event batch size metrics #131

Add REST call and event batch size metrics #131

matthew1001 commented Apr 18, 2023 •

edited

Loading

peterbroadhurst Apr 21, 2023

matthew1001 Apr 21, 2023

matthew1001 Apr 21, 2023 •

edited

Loading

awrichar Apr 21, 2023 •

edited

Loading

matthew1001 Apr 21, 2023

awrichar Apr 21, 2023 •

edited

Loading

matthew1001 Apr 21, 2023

awrichar May 22, 2023

matthew1001 Jun 20, 2023

awrichar May 22, 2023

matthew1001 Jun 19, 2023

awrichar May 22, 2023

matthew1001 Jun 20, 2023

nguyer commented Feb 15, 2024

EnriqueL8 commented Apr 24, 2024

		@@ -126,6 +129,21 @@ export abstract class EventStreamProxyBase extends WebSocketEventsBase {
		}

		private async processEvents(batch: EventBatch) {

Add REST call and event batch size metrics #131

Are you sure you want to change the base?

Add REST call and event batch size metrics #131

Conversation

matthew1001 commented Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthew1001 Apr 21, 2023 • edited Loading

Choose a reason for hiding this comment

awrichar Apr 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awrichar Apr 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nguyer commented Feb 15, 2024

EnriqueL8 commented Apr 24, 2024

matthew1001 commented Apr 18, 2023 •

edited

Loading

matthew1001 Apr 21, 2023 •

edited

Loading

awrichar Apr 21, 2023 •

edited

Loading

awrichar Apr 21, 2023 •

edited

Loading