-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM] Metrics powered UI POC #66871
[APM] Metrics powered UI POC #66871
Conversation
Pinging @elastic/apm-ui (Team:apm) |
|
||
const index = [ | ||
indices['apm_oss.metricsIndices'], | ||
indices['apm_oss.errorIndices'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any plans to pre-aggregate errors in the future?
|
||
if (!hasTransactionDurationMetrics) { | ||
index.push(indices['apm_oss.transactionIndices']); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having this for every query, what about letting the APM elasticsearch client handle this? So the consumer side specifies the indicies that should be queried as strings (typescript can enforce correctness):
{
apmIndices: ['transactions', 'errors']
body: {...}
}
Then the client will determine which indices should be queried according to hasTransactionDurationMetrics
:
const indices = apmIndices.map(apmIndex => {
switch(apmIndex) {
case 'span':
return indices['apm_oss.spanIndices'];
case 'error':
return indices['apm_oss.errorIndices'];
case 'metric':
return indices['apm_oss.metricsIndices'];
case 'transaction':
return hasTransactionDurationMetrics
? indices['apm_oss.metricsIndices']
: indices['apm_oss.transactionIndices']
}
})
We need this client anyway to make it easier when teams outside APM needs want to query apm data (siem for instance)
filter: [ | ||
{ | ||
terms: { [PROCESSOR_EVENT]: ['transaction', 'error', 'metric'] } | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the processor.event filter could cause problems because by default we query apm-*
(all indices). So this query will return transactions, metrics, errors and spans by default.
I think you'll need the same logic as for the index.
I think the queried indicies and processor.event should always be aligned. I was hoping we could add the processor.event
filter in the setup (middleware) but it might be better with an explicit helper. Something like:
const { index, processorEvent } = hasTransactionDurationMetrics
? // transactions are stored as pre-aggregated (histogram) docs in metric index
getIndexAndProcessorEvent('metric', 'error')
: // transactions are not pre-aggregated and stored in transaction index
getIndexAndProcessorEvent('metric', 'error', transaction);
Based on some data analysis, we are likely not going to aggregate on |
const start = | ||
'start' in query ? { start: moment.utc(query.start).valueOf() } : {}; | ||
const end = 'end' in query ? { end: moment.utc(query.end).valueOf() } : {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
const start = | |
'start' in query ? { start: moment.utc(query.start).valueOf() } : {}; | |
const end = 'end' in query ? { end: moment.utc(query.end).valueOf() } : {}; | |
const start = 'start' in query ? moment.utc(query.start).valueOf() : undefined; | |
const end = 'end' in query ? moment.utc(query.end).valueOf() : undefined; |
Then you don't have to do start.start
and end.end
. I think start
and end
should always be returned, regardless if they are undefined
const checkTransactionDurationMetrics = async () => { | ||
const response = await client.search({ | ||
index: indices['apm_oss.metricsIndices'], | ||
body: { | ||
query: { | ||
bool: { | ||
filter: [ | ||
{ term: { [PROCESSOR_EVENT]: 'metric' } }, | ||
...(start.start && end.end | ||
? [{ range: rangeFilter(start.start, end.end) }] | ||
: []), | ||
{ exists: { field: TRANSACTION_DURATION_HISTOGRAM } } | ||
] | ||
} | ||
}, | ||
size: 0 | ||
}, | ||
terminateAfter: 1 | ||
}); | ||
|
||
return { | ||
hasTransactionDurationMetrics: response.hits.total.value > 0 | ||
}; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps extract this to separate file to avoid making setup_request too overwhelming
end, | ||
uiFiltersES, | ||
indices, | ||
hasTransactionDurationMetrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit about the naming: Is it important to note that it's TransactionDuration
docs, and not just Transaction
docs?
What about hasPreAggregatedTransactions
or isPreAggregationEnabled
(if we in the future want to pre-aggregate other data types, eg. errors ?
💔 Build Failed
Failed CI StepsTest FailuresKibana Pipeline / x-pack-intake-agent / X-Pack Jest Tests.x-pack/plugins/apm/common.Transaction TRANSACTION_DURATION_HISTOGRAMStandard Out
Stack Trace
Kibana Pipeline / x-pack-intake-agent / X-Pack Jest Tests.x-pack/plugins/apm/common.Span TRANSACTION_DURATION_HISTOGRAMStandard Out
Stack Trace
Kibana Pipeline / x-pack-intake-agent / X-Pack Jest Tests.x-pack/plugins/apm/common.Error TRANSACTION_DURATION_HISTOGRAMStandard Out
Stack Trace
and 2 more failures, only showing the first 3. History
To update your PR or re-run it, just comment with: |
Superseded by #73953 |
This is a POC to demonstrate some of the work we need to do to migrate to a metrics powered UI. The approach is:
setup_request
, determine if, for the given time range, the cluster has 1 or more transaction duration metricsetstransaction.duration.histogram
rather thantransaction.duration.us
I've opted for this approach because there are several scenarios to consider, and this is the simplest decision tree I can think of. For instance, the user could have transaction duration metrics, but could be looking at an earlier time range, or they could be using the query bar and filtering on non-aggregated fields.
I've also split up the services aggregations into smaller requests (which is recommended to better utilize parallelism.
I haven't done any performance comparisons yet.
To try this out, you'll need to enable aggregation in APM Server:
-E apm-server.aggregation.enabled=true
. See elastic/apm-server#3651.Before:
After:
(Some data is off due to (some of the) agents not running for the full 24 hours).