-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timing breakdown #588
Timing breakdown #588
Conversation
8d5c32c
to
d0e5d51
Compare
apm-agent-core/src/main/java/co/elastic/apm/agent/metrics/Labels.java
Outdated
Show resolved
Hide resolved
apm-agent-core/src/main/java/co/elastic/apm/agent/metrics/Labels.java
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## master #588 +/- ##
============================================
+ Coverage 63.82% 63.83% +<.01%
Complexity 68 68
============================================
Files 205 207 +2
Lines 8254 8496 +242
Branches 1056 1062 +6
============================================
+ Hits 5268 5423 +155
- Misses 2673 2741 +68
- Partials 313 332 +19
Continue to review full report at Codecov.
|
f6cc3ff
to
576f2da
Compare
@bmorelli25 the last commit contains documentation for the metrics. I'd love to get some feedback from you 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These docs look almost as good as your balcony in the summer ☀️
Co-Authored-By: Brandon Morelli <[email protected]>
Co-Authored-By: Brandon Morelli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is very big and complicated.
So I did my best to follow and provide initial feedback.
We can continue the review together, adding notes here as necessary.
apm-agent-core/src/main/java/co/elastic/apm/agent/impl/transaction/AbstractSpan.java
Outdated
Show resolved
Hide resolved
apm-agent-core/src/main/java/co/elastic/apm/agent/impl/transaction/AbstractSpan.java
Outdated
Show resolved
Hide resolved
apm-agent-core/src/main/java/co/elastic/apm/agent/impl/transaction/Span.java
Show resolved
Hide resolved
apm-agent-core/src/main/java/co/elastic/apm/agent/impl/transaction/Span.java
Outdated
Show resolved
Hide resolved
timer.update(duration); | ||
if (finished) { | ||
// in case end()->trackMetrics() has been called concurrently | ||
// don't leak timers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not thread safe:
Thread A:
AbstractSpan#end() -> Transaction#doEnd -> trackMetrics() // all timers are reset, finished == false
Thread B:
AbstractSpan#end() -> Span#doEnd // creates an put a Timer, doesn't clean because finished == false
Thread A:
AbstractSpan#end() -> Transaction#doEnd (continue)
AbstractSpan#end() -> sets finished = true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will set finished = true
earlier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be fixed now
apm-agent-core/src/main/java/co/elastic/apm/agent/metrics/MetricRegistry.java
Show resolved
Hide resolved
apm-agent-core/src/main/java/co/elastic/apm/agent/metrics/MetricRegistry.java
Show resolved
Hide resolved
apm-agent-core/src/main/java/co/elastic/apm/agent/metrics/MetricSet.java
Show resolved
Hide resolved
* Returns a mutable {@link List}, roughly equal to the {@link #keySet()}. | ||
* <p> | ||
* Note that in concurrent scenarios, the key list may be a subset of the values of the respective {@link #keySet()}. | ||
* Entries added via the {@code compute*} family of methods are not reflected in the list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the compute
methods?
I think this list in general may not reflect the underlying keySet()
as it is not thread-safe with regard to the underlying map, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example java.util.concurrent.ConcurrentHashMap#computeIfAbsent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synchronizing on add and remove operations should also make it thread-safe, right? That would make the map more applicable for other use-cases where modifying the list does not happen under contention.
* It can be used to iterate over the map's keys without allocating an {@link java.util.Iterator} | ||
*/ | ||
public class KeyListConcurrentHashMap<K, V> extends ConcurrentHashMap<K, V> { | ||
private final List<K> keyList = Collections.synchronizedList(new ArrayList<K>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure the map can only grow in size- throw unsupported operation exceptions from remove
and clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #588 (comment)
apm-agent-core/src/main/java/co/elastic/apm/agent/impl/transaction/Transaction.java
Outdated
Show resolved
Hide resolved
apm-agent-core/src/main/java/co/elastic/apm/agent/impl/transaction/Transaction.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't go thoroughly through tests, but mostly through everything else.
apm-agent-core/src/main/java/co/elastic/apm/agent/metrics/MetricRegistry.java
Outdated
Show resolved
Hide resolved
apm-agent-core/src/main/java/co/elastic/apm/agent/impl/transaction/Transaction.java
Outdated
Show resolved
Hide resolved
update(durationUs, 1); | ||
} | ||
|
||
public void update(long durationUs, long count) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that must be used within code that guarantees atomicity, or make atomic, so that can't be queried while updated
final Timer timer = spanTimings.get(spanType); | ||
if (timer.getCount() > 0) { | ||
labels.spanType(spanType.getSpanType()).spanSubType(spanType.getSpanSubType()); | ||
metricRegistry.updateTimer("span.self_time", labels, timer.getTotalTimeUs(), timer.getCount()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, Timer update is not atomic, so reading here is not guaranteed to get fully updated Timer state (ie updated halfway by incrementTimer
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should now be fixed
} | ||
|
||
@Override | ||
public void resetState() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same- not atomic and may occur concurrently with reads or other writes
if (timer.getCount() > 0) { | ||
labels.spanType(spanType.getSpanType()).spanSubType(spanType.getSpanSubType()); | ||
metricRegistry.updateTimer("span.self_time", labels, timer.getTotalTimeUs(), timer.getCount()); | ||
timer.resetState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same- may be executed concurrently with updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should now be fixed
serializeValue(key, ".count", timer.getCount(), jw); | ||
jw.writeByte(JsonWriter.COMMA); | ||
serializeValue(key, ".sum", timer.getTotalTimeMs(), jw); | ||
timer.resetState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these resets seem awkward here, much better fit in metricSet.onAfterReport()
. Is this done here in order to prevent additional iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that the idea was to prevent another iteration. But it's probably better to trade off in favor of readability here.
...agent-core/src/main/java/co/elastic/apm/agent/report/serialize/MetricRegistrySerializer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A really epic PR! 👏 👏
I think this change should come with a visible warning that high cardinality of span types and subtypes is highly discouraged as it will make this feature useless. It will also result with increased resource consumption (memory, cpu and bandwidth) but not sure how significant that would be.
@@ -48,6 +60,9 @@ | |||
*/ | |||
private final TransactionContext context = new TransactionContext(); | |||
private final SpanCount spanCount = new SpanCount(); | |||
// type: subtype: timer | |||
private final KeyListConcurrentHashMap<String, KeyListConcurrentHashMap<String, Timer>> spanTimings = new KeyListConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that this map does not get recycled and instead accumulates all span types and subtypes.
Maybe an additional ALL-CAPITALS comment about that in the resetState
method, so to not let anyone fall for this one.
protected AtomicInteger references = new AtomicInteger(); | ||
protected volatile boolean finished = true; | ||
|
||
public int getReferenceCount() { | ||
return references.get(); | ||
} | ||
|
||
private static class ReentrantTimer implements Recyclable { | ||
|
||
private AtomicInteger nestingLevel = new AtomicInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this name confusing. I thought it is about nested children, but it is more about counting concurrent-updaters of only first level (ie direct children) spans.
closes #556
depends on #587, elastic/apm-server#2086
TODO
Reportself_time
on transactions/spans?Labels#immutableCopy
Open questions
vs monotonically incrementTimer
counters