Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Metrics powered UI POC #66871

Closed
wants to merge 4 commits into from

Conversation

dgieselaar
Copy link
Member

@dgieselaar dgieselaar commented May 18, 2020

This is a POC to demonstrate some of the work we need to do to migrate to a metrics powered UI. The approach is:

  • in setup_request, determine if, for the given time range, the cluster has 1 or more transaction duration metricsets
  • If this is the case, use the metrics index and aggregate on transaction.duration.histogram rather than transaction.duration.us

I've opted for this approach because there are several scenarios to consider, and this is the simplest decision tree I can think of. For instance, the user could have transaction duration metrics, but could be looking at an earlier time range, or they could be using the query bar and filtering on non-aggregated fields.

I've also split up the services aggregations into smaller requests (which is recommended to better utilize parallelism.

I haven't done any performance comparisons yet.

To try this out, you'll need to enable aggregation in APM Server: -E apm-server.aggregation.enabled=true. See elastic/apm-server#3651.

Before:
Screenshot 2020-05-18 at 09 59 35

After:
Screenshot 2020-05-18 at 09 58 20

(Some data is off due to (some of the) agents not running for the full 24 hours).

@botelastic botelastic bot added the Team:APM All issues that need APM UI Team support label May 18, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:apm)

@sorenlouv sorenlouv linked an issue May 20, 2020 that may be closed by this pull request

const index = [
indices['apm_oss.metricsIndices'],
indices['apm_oss.errorIndices']
Copy link
Member

@sorenlouv sorenlouv May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any plans to pre-aggregate errors in the future?


if (!hasTransactionDurationMetrics) {
index.push(indices['apm_oss.transactionIndices']);
}
Copy link
Member

@sorenlouv sorenlouv May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having this for every query, what about letting the APM elasticsearch client handle this? So the consumer side specifies the indicies that should be queried as strings (typescript can enforce correctness):

{
  apmIndices: ['transactions', 'errors']
  body: {...}
}

Then the client will determine which indices should be queried according to hasTransactionDurationMetrics:

const indices = apmIndices.map(apmIndex => {
 switch(apmIndex) {
  case 'span': 
    return indices['apm_oss.spanIndices'];

  case 'error': 
    return indices['apm_oss.errorIndices'];

  case 'metric': 
    return indices['apm_oss.metricsIndices'];

  case 'transaction': 
    return hasTransactionDurationMetrics 
      ? indices['apm_oss.metricsIndices'] 
      : indices['apm_oss.transactionIndices'] 
 }
})

We need this client anyway to make it easier when teams outside APM needs want to query apm data (siem for instance)

filter: [
{
terms: { [PROCESSOR_EVENT]: ['transaction', 'error', 'metric'] }
},
Copy link
Member

@sorenlouv sorenlouv May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the processor.event filter could cause problems because by default we query apm-* (all indices). So this query will return transactions, metrics, errors and spans by default.
I think you'll need the same logic as for the index.

I think the queried indicies and processor.event should always be aligned. I was hoping we could add the processor.event filter in the setup (middleware) but it might be better with an explicit helper. Something like:

const { index, processorEvent } = hasTransactionDurationMetrics
  ? // transactions are stored as pre-aggregated (histogram) docs in metric index
    getIndexAndProcessorEvent('metric', 'error')

  : // transactions are not pre-aggregated and stored in transaction index
    getIndexAndProcessorEvent('metric', 'error', transaction);

@dgieselaar
Copy link
Member Author

Based on some data analysis, we are likely not going to aggregate on user_agent.original as it greatly increases the number of metric documents that are stored. That means that we cannot migrate some of the RUM charts for now, and users that have RUM services and want to use those charts, will have to keep storing unsampled transactions until we find a fix for this issue. See elastic/apm-server#3841.

Comment on lines +107 to +109
const start =
'start' in query ? { start: moment.utc(query.start).valueOf() } : {};
const end = 'end' in query ? { end: moment.utc(query.end).valueOf() } : {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

Suggested change
const start =
'start' in query ? { start: moment.utc(query.start).valueOf() } : {};
const end = 'end' in query ? { end: moment.utc(query.end).valueOf() } : {};
const start = 'start' in query ? moment.utc(query.start).valueOf() : undefined;
const end = 'end' in query ? moment.utc(query.end).valueOf() : undefined;

Then you don't have to do start.start and end.end. I think start and end should always be returned, regardless if they are undefined

Comment on lines 111 to 134
const checkTransactionDurationMetrics = async () => {
const response = await client.search({
index: indices['apm_oss.metricsIndices'],
body: {
query: {
bool: {
filter: [
{ term: { [PROCESSOR_EVENT]: 'metric' } },
...(start.start && end.end
? [{ range: rangeFilter(start.start, end.end) }]
: []),
{ exists: { field: TRANSACTION_DURATION_HISTOGRAM } }
]
}
},
size: 0
},
terminateAfter: 1
});

return {
hasTransactionDurationMetrics: response.hits.total.value > 0
};
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps extract this to separate file to avoid making setup_request too overwhelming

end,
uiFiltersES,
indices,
hasTransactionDurationMetrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit about the naming: Is it important to note that it's TransactionDuration docs, and not just Transaction docs?

What about hasPreAggregatedTransactions or isPreAggregationEnabled (if we in the future want to pre-aggregate other data types, eg. errors ?

@kibanamachine
Copy link
Contributor

💔 Build Failed

Failed CI Steps


Test Failures

Kibana Pipeline / x-pack-intake-agent / X-Pack Jest Tests.x-pack/plugins/apm/common.Transaction TRANSACTION_DURATION_HISTOGRAM

Link to Jenkins

Standard Out

Failed Tests Reporter:
  - Test has not failed recently on tracked branches


Stack Trace

Error: expect(received).toMatchSnapshot()

New snapshot was not written. The update flag must be explicitly passed to write a new snapshot.

This is likely because this test is run in a continuous integration (CI) environment in which snapshots are not written by default.

Received value undefined
    at Object.it (/dev/shm/workspace/kibana/x-pack/plugins/apm/common/elasticsearch_fieldnames.test.ts:176:21)
    at Object.asyncJestTest (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/jasmineAsyncInstall.js:102:37)
    at resolve (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/queueRunner.js:43:12)
    at new Promise (<anonymous>)
    at mapper (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/queueRunner.js:26:19)
    at promise.then (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/queueRunner.js:73:41)
    at process._tickCallback (internal/process/next_tick.js:68:7)

Kibana Pipeline / x-pack-intake-agent / X-Pack Jest Tests.x-pack/plugins/apm/common.Span TRANSACTION_DURATION_HISTOGRAM

Link to Jenkins

Standard Out

Failed Tests Reporter:
  - Test has not failed recently on tracked branches


Stack Trace

Error: expect(received).toMatchSnapshot()

New snapshot was not written. The update flag must be explicitly passed to write a new snapshot.

This is likely because this test is run in a continuous integration (CI) environment in which snapshots are not written by default.

Received value undefined
    at Object.it (/dev/shm/workspace/kibana/x-pack/plugins/apm/common/elasticsearch_fieldnames.test.ts:176:21)
    at Object.asyncJestTest (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/jasmineAsyncInstall.js:102:37)
    at resolve (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/queueRunner.js:43:12)
    at new Promise (<anonymous>)
    at mapper (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/queueRunner.js:26:19)
    at promise.then (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/queueRunner.js:73:41)
    at process._tickCallback (internal/process/next_tick.js:68:7)

Kibana Pipeline / x-pack-intake-agent / X-Pack Jest Tests.x-pack/plugins/apm/common.Error TRANSACTION_DURATION_HISTOGRAM

Link to Jenkins

Standard Out

Failed Tests Reporter:
  - Test has not failed recently on tracked branches


Stack Trace

Error: expect(received).toMatchSnapshot()

New snapshot was not written. The update flag must be explicitly passed to write a new snapshot.

This is likely because this test is run in a continuous integration (CI) environment in which snapshots are not written by default.

Received value undefined
    at Object.it (/dev/shm/workspace/kibana/x-pack/plugins/apm/common/elasticsearch_fieldnames.test.ts:176:21)
    at Object.asyncJestTest (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/jasmineAsyncInstall.js:102:37)
    at resolve (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/queueRunner.js:43:12)
    at new Promise (<anonymous>)
    at mapper (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/queueRunner.js:26:19)
    at promise.then (/dev/shm/workspace/kibana/node_modules/jest-jasmine2/build/queueRunner.js:73:41)
    at process._tickCallback (internal/process/next_tick.js:68:7)

and 2 more failures, only showing the first 3.

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@dgieselaar
Copy link
Member Author

Superseded by #73953

@dgieselaar dgieselaar closed this Sep 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:enhancement Team:APM All issues that need APM UI Team support v7.9.2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[APM] Support latency metrics
5 participants