Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Star tree index config changes #13917

Closed

Conversation

bharath-techie
Copy link
Contributor

@bharath-techie bharath-techie commented May 31, 2024

Description

This PR contains the changes for star tree index config with feature flag protection. To make the changes extensible to other multi-field/composite/datacube type of indices in future, I've generalized the implementation under 'CompositeIndex'.

CompositeIndexConfig will be part of index settings, since it contains multiple fields, similar to IndexSortConfig.

Minimal configuration :

Minimal config user can provide will be :

  • The name of the dimension fields in order ( The order affects the performance and size of the star tree index)
  • If dimension is of non default type such as date , then user needs to specify the type so that we can parse additional info such as intervals
  • Metric fields.
{
    "settings": {
        "index.composite_index.config": {
            "composite_config_1": {
                "dimensions_order": [
                    "status",
                    "timestamp"
                ],
                "metrics": [
                    "size"
                ],
                "dimensions_config": {
                    "timestamp": {
                        "field": "@timestamp",
                        "type": "date"
                    }
                }
            }
        }
    }
}

And the defaults will be filled for the above fields.
NOTE : We will tune the defaults throughout the development of the star tree index.

Defaults :

Timestamp field intervals = [ Minute, Hour ]
Default Metrics for each metric field = [ SUM, COUNT, AVG, MIN, MAX ]

Expanded configuration :

{
    "settings.index.compsosite_index.config": {
    
        // will use the keys as the name of the composite fields
        "composite_config_1": {
             
            // Optional, star tree is default 
            "index_mode": "startree",
            
            
            // Mandatory
            "dimensions_order": [
                "status",
                "price",
                "price-1000-interval",
                "timestamp"
            ],
            
            
            // Mandatory
            "metrics": [
                "request_size",
                "header_size"
            ]
        },
        
        
        // optional config to tune star tree specific parameters
        "startree_config": {
        
       
            // will be internal setting for testing / special use cases - default is off heap
            "build_mode": "on_heap",
            
            
            // Optional - default comes from cluster level setting
            "max_leaf_records": 100,
            
            
            // Optional - decide to keep/remove post experimental mode based on this config's usefulness
            "skip_star_node_creation_for_dimensions": [
                "@timestamp",
                "status"
            ],
        },
        
        
        // config for dimension overrides and specifying date field
        "dimensions_config": {
            "timestamp": {
            
                // optional - if user doesn't specify , take the key name as field
                "field": "@timestamp",               
                
                // Mandatory if user needs to tune the field's parameters such as calendar_interval
                "type": "date",            
                
                // P1 - can be added later based on feedback
                "fixed_interval": [
                    "1h",
                    "4h",
                    "1d"
                ],                
                
                // optional - defaults comes from cluster level setting
                "calendar_interval": [
                    "1d",
                    "1m",
                    "1y"
                ]
            },           
            
            // P1 - hisogram field - post milestone 1 if there are use cases / optimizations [ right now there is no support ]
            "price-1000-interval": {
                "field": "price",
                "interval": "1000"
            }
        },       
        
        // Optional config for metric overrides 
        "metrics_config": {
            "request_size": {
                "field": "request_size",
                
                
                // optional - default comes from cluster settings, all 5 metrics will be added as default
                "metrics": [
                    "sum",
                    "avg",
                    "min",
                    "max"
                ]
            },
            "header_size": {
                "field": "header_size",
                "metrics": [
                    "sum",
                    "avg"
                ]
            }
        }
    },
    "composite_config_2": {
                ...
    }
}


Validations

Apart from basic validations based on user input :

  • We will start with support for one composite field per config , so technically one star tree index per source index.
  • Maximum of 10 dimensions [will fine tune this]
  • For date fields - maximum for 3 intervals
  • All dimension fields and metric fields must be aggregation compatible [ doc values + field data supported ]
  • We will add a limit on number of metrics later on.

Open questions

  • Saw sort valiadations on 'shrink index' api etc - where I'll also add similar validations. What other index APIs we need to restrict creation / support of star tree ?

Related Issues

#12498
#13875

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • API changes companion pull request created.
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 8808c6e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 11d7c4f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 1, 2024

❌ Gradle check result for 9766fce: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 1, 2024

❌ Gradle check result for ed5ad3e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 6, 2024

❌ Gradle check result for 3af0dc0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@bharath-techie bharath-techie force-pushed the startreeconfig branch 2 times, most recently from 3576504 to 4cce0d2 Compare June 6, 2024 12:00
Copy link
Contributor

github-actions bot commented Jun 6, 2024

❌ Gradle check result for 3576504: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 6, 2024

❌ Gradle check result for 4cce0d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 6, 2024

❌ Gradle check result for 64b3559: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 7, 2024

✅ Gradle check result for 2018b0c: SUCCESS

Copy link

codecov bot commented Jun 7, 2024

Codecov Report

Attention: Patch coverage is 79.81073% with 64 lines in your changes missing coverage. Please review.

Project coverage is 71.71%. Comparing base (b15cb0c) to head (2018b0c).
Report is 367 commits behind head on main.

Current head 2018b0c differs from pull request most recent head dce8194

Please upload reports for the commit dce8194 to get more accurate results.

Files Patch % Lines
...rch/index/compositeindex/CompositeIndexConfig.java 76.95% 36 Missing and 17 partials ⚠️
...va/org/opensearch/index/compositeindex/Metric.java 66.66% 3 Missing ⚠️
...in/java/org/opensearch/indices/IndicesService.java 70.00% 2 Missing and 1 partial ⚠️
...rg/opensearch/index/compositeindex/MetricType.java 85.71% 1 Missing and 1 partial ⚠️
...search/index/compositeindex/StarTreeFieldSpec.java 88.88% 1 Missing and 1 partial ⚠️
...opensearch/index/compositeindex/DateDimension.java 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13917      +/-   ##
============================================
+ Coverage     71.42%   71.71%   +0.29%     
- Complexity    59978    61659    +1681     
============================================
  Files          4985     5088     +103     
  Lines        282275   289388    +7113     
  Branches      40946    41886     +940     
============================================
+ Hits         201603   207544    +5941     
- Misses        63999    64687     +688     
- Partials      16673    17157     +484     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Jun 7, 2024

❌ Gradle check result for 5eca7aa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

… setting at cluster level

Signed-off-by: Bharathwaj G <[email protected]>
Copy link
Contributor

github-actions bot commented Jun 7, 2024

❌ Gradle check result for dce8194: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

public class StarTreeFieldSpec implements CompositeFieldSpec {

private final AtomicInteger maxLeafDocs = new AtomicInteger();
private final List<String> skipStarNodeCreationInDims;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we will need to expose these fields via getters so that we can skip these during star tree creation. Build Mode will help us switch between onheap and offheap modes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants