Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metadata Immutability] Change different indices lookup objects from array type to lists #14723
base: main
Are you sure you want to change the base?
[Metadata Immutability] Change different indices lookup objects from array type to lists #14723
Changes from all commits
d2289d2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Check warning on line 521 in server/src/main/java/org/opensearch/action/admin/cluster/health/TransportClusterHealthAction.java
Codecov / codecov/patch
server/src/main/java/org/opensearch/action/admin/cluster/health/TransportClusterHealthAction.java#L521
Check warning on line 339 in server/src/main/java/org/opensearch/action/admin/indices/datastream/GetDataStreamAction.java
Codecov / codecov/patch
server/src/main/java/org/opensearch/action/admin/indices/datastream/GetDataStreamAction.java#L339
Check warning on line 709 in server/src/main/java/org/opensearch/cluster/metadata/IndexNameExpressionResolver.java
Codecov / codecov/patch
server/src/main/java/org/opensearch/cluster/metadata/IndexNameExpressionResolver.java#L709
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be expensive operation to copy the set to a list every time for index resolution.
Have run micro benchmark on your changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here ConcreteAllIndices(allIndices) are unmodifiableSet, so List.copyOf() will not really create a copy .
There is an implementation Note in javadoc of this API - "If the given Collection is an unmodifiable List, calling copyOf will generally not create a copy". Also if we look in to the internal implementation of Lisy.copyOf() it's clear that, it's not creating a copy if collection is an AbstractImmutableList.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are doing
Collections.unmodifiableSet(allIndices);
which creates new object ofUnmodifiableSet
.UnmodifiableSet
extendsUnmodifiableCollection
and it is not extendingAbstractImmutableList
. It will end up copying the elements. Can you cross check?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct. So what do you suggest here?
concrete indices treated as Set in the existing implementation -
OpenSearch/server/src/main/java/org/opensearch/cluster/metadata/Metadata.java
Line 1636 in 67a2e4c
That's why I kept only concrete indices as Set and all other indices as List. If we keep concrete indices also as List, then this conversion won't be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes right, lets check the usage in the code if it has to be Set, else i agree keeping it as list would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I performed the performance testing as @sandeshkr419 suggested below.
I created an index and 50000 aliases initially, then again created 500*10 = 5000 aliases (so total 55000) and calculated the average time taken for these last 500 sets of alias creation separately. The results shows that there is a consistent improvement in each sets(~10ms) after this change.
Result:
Before the change:
Average of 1st 500 sets - 102ms
Average of 2nd 500 sets - 102ms
Average of 3rd 500 sets - 103ms
Average of 4th 500 sets - 103ms
Average of 5th 500 sets - 107ms
Average of 6th 500 sets - 107ms
Average of 7th 500 sets - 106ms
Average of 8th 500 sets - 108ms
Average of 9th 500 sets - 110ms
Average of 10th 500 sets - 109ms
After the change:
Average of 1st 500 sets - 92ms
Average of 2nd 500 sets - 92ms
Average of 3rd 500 sets - 91ms
Average of 4th 500 sets - 92ms
Average of 5th 500 sets - 93ms
Average of 6th 500 sets - 94ms
Average of 7th 500 sets - 95ms
Average of 8th 500 sets - 95ms
Average of 9th 500 sets - 97ms
Average of 10th 500 sets - 97ms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandeshkr419 @andrross @shwetathareja Please review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @akolarkunnu for performing the perf runs. Since there is a change in ClusterStateHealth, we might need a similar run for
health
API calls. Could we also capture the memory allocations corresponding to the duration of run and compare them before / after change ?Also, trying to understand as to why 50K aliases were created and not the actual index ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also create indices, its just you need to modify node limits on max indices/shards per node - plus the the change is cluster state is identical (as in same/similar objects are modified).
I don't think we would be able to capture changes (improvements actually) here, the only thing we have omitted here is back and worth list-set conversions for some objects (stored as one type, retrieved as different). [Recalling from my own PR when I had tried to benchmark some changes in cluster state]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Health API iterates over the IndexMetadata to compute the overall cluster health. Hence i was thinking that creating indexes rather than alias would have more entries to iterate for health computation. For cluster health computation, there were less memory allocations seen when it was switched to size based array iteration. Ref PR :- #15492
Please take a look at the memory allocations in flame graph in the referenced PR. Could we verify that switching from array to set iterator does not incur additional memory allocations for health API.
Check warning on line 168 in server/src/main/java/org/opensearch/index/recovery/RemoteStoreRestoreService.java
Codecov / codecov/patch
server/src/main/java/org/opensearch/index/recovery/RemoteStoreRestoreService.java#L168