Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rollup] Disallow index patterns that match rollup indices #30491

Merged
merged 5 commits into from
Jun 5, 2018

Conversation

polyfractal
Copy link
Contributor

We should not allow the user to configure index patterns that also match rollup indices (either the index specified in the config, or previously created jobs).

Matching against rollup indices will almost certainly lead to bad behavior, because the expected fields in the user index won't exist in the rollup (due to renaming). It is also unlikely a user will want to match against a rollup index.

For example, it is quite natural for a user to specify metricbeat-* as the index pattern, and then store the rollups in metricbeat-rolled. This will start throwing errors as soon as the rollup index is created because the indexer will try to search it.

While we're at it, we should forbid the index pattern being equal to the rollup index too.

/cc @cdahlqvist

We should not allow the user to configure index patterns that also match
rollup indices (either the index specified in the config, or previously
created jobs).

Matching against rollup indices will almost certainly lead to bad behavior,
because the expected fields in the user index won't exist in the rollup
(due to renaming).  It is also unlikely a user will want to match
against a rollup index.

For example, it is quite natural for a user to specify `metricbeat-*`
as the index pattern, and then store the rollups in `metricbeat-rolled`.
This will start throwing errors as soon as the rollup index is created
because the indexer will try to search it.
@polyfractal polyfractal added >enhancement review :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data labels May 9, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, the index pattern should not match the rollup index name, this is an important validation. However I wonder if the second validation is really needed/possible. What happens if you define a rollup index that match an index pattern of another job ?

}
if (Regex.isSimpleMatchPattern(indexPattern)) {
if (Regex.simpleMatch(indexPattern, rollupIndex)) {
throw new IllegalArgumentException("Index pattern would match rollup index name which is not allowed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch !

/**
* Ensure that the index pattern doesn't match any pre-existing rollup indices. We don't need to check
* if the pattern would match it's own rollup index because the request already validated that.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can be less strict and only forbid mixing rollup and non-rollup indices in the same pattern ?
It should be fine to create the job if the pattern matches rollup indices only. Although I wonder if the problem of mixing indices (rollup and non-rollup) is not caught by the field validation. As you mentioned they have different field names so the validation of field names should fail all the time ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the field name validation, although i'm still a bit worried about unintended consequences of matching against a rollup index. Barring the "rollup a rollup" usecase, it doesn't seem wise to let users match against an existing rollup if we can help it?

But maybe it's not worth the extra complexity? If it matches another rollup index, that index will exist so we can validate against the fields and it should throw an exception. It also doesn't protect against future rollup indices from matching the defined pattern, so we'd be relying on field level validation there anyway.

I think I'm convinced. Will make the changes, should simplify the PR a bunch :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it turns out to be a bit more complicated. The field-level validation is broken, due to how field caps behaves.

Field caps only tells you the caps for the fields that match, in the resolved indices. But it doesn't tell you about the indices that were resolved but didn't match any field. So we won't know the index pattern matched some random index without the correct fields.

That's unrelated to this change to this PR, but needs to be fixed for us to simplify. Going to see what it takes, we may have to manually resolve the indices anyway to check. Will keep you updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We can also add the matching indices in field caps's response. Could be per field : the list of index that doesn't have a definition for the field or global: the list of indices that match the pattern and then you can infer which index is missing from the field's response. I think it's a valid feature to add even if we don't use it in rollups.

@polyfractal
Copy link
Contributor Author

@jimczi I pushed a commit to prune this back to just validating that the pattern doesn't self-match. I'll follow up with the field caps limitation and see what we can do there, then circle back to Rollup and add a few tests to make sure the field-name validation works as expected after we tweak field caps.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @polyfractal

@polyfractal
Copy link
Contributor Author

Jenkins, run gradle build tests

@polyfractal polyfractal merged commit a1c9def into elastic:master Jun 5, 2018
polyfractal added a commit that referenced this pull request Jun 5, 2018
We should not allow the user to configure index patterns that also match
the index which stores the rollup index.

For example, it is quite natural for a user to specify `metricbeat-*`
as the index pattern, and then store the rollups in `metricbeat-rolled`.
This will start throwing errors as soon as the rollup index is created
because the indexer will try to search it.

Note: this does not prevent the user from matching against existing
rollup indices.  That should be prevented by the field-level validation
during job creation.
jasontedor added a commit that referenced this pull request Jun 5, 2018
* elastic/master:
  [DOCS] Creates rest-api folder in docs
  [Rollup] Disallow index patterns that match the rollup index (#30491)
  Add cors support to NioHttpServerTransport (#30827)
  [DOCS] Fixes security example (#31082)
  Allow terms query in _rollup_search (#30973)
jasontedor added a commit that referenced this pull request Jun 5, 2018
* elastic/6.x:
  [Rollup] Disallow index patterns that match the rollup index (#30491)
  Revert "Fixing MixedClusterClientYamlTestSuiteIT"
  Fixing MixedClusterClientYamlTestSuiteIT
  Add get mappings support to high-level rest client (#30889)
  [DOCS] Fixes security example (#31082)
  Allow terms query in _rollup_search (#30973)
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 5, 2018
* ccr:
  [DOCS] Creates rest-api folder in docs
  [Rollup] Disallow index patterns that match the rollup index (elastic#30491)
  Replace exact numDocs by soft-del count in SegmentInfo (elastic#31086)
  Upgrade to Lucene-7.4.0-snapshot-0a7c3f462f (elastic#31073)
  Add cors support to NioHttpServerTransport (elastic#30827)
  [DOCS] Fixes security example (elastic#31082)
  Allow terms query in _rollup_search (elastic#30973)
  Removing erroneous repeat
  Adapt bwc versions after backporting elastic#30983 to 6.4
  [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient
  TEST:  Retry synced-flush if ongoing ops on primary (elastic#30978)
  Fix docs build.
  Only auto-update license signature if all nodes ready (elastic#30859)
  Add BlobContainer.writeBlobAtomic() (elastic#30902)
  Add a doc value format to binary fields. (elastic#30860)
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Jun 5, 2018
* ccr:
  [DOCS] Creates rest-api folder in docs
  [Rollup] Disallow index patterns that match the rollup index (elastic#30491)
  Replace exact numDocs by soft-del count in SegmentInfo (elastic#31086)
  Upgrade to Lucene-7.4.0-snapshot-0a7c3f462f (elastic#31073)
  Add cors support to NioHttpServerTransport (elastic#30827)
  [DOCS] Fixes security example (elastic#31082)
  Allow terms query in _rollup_search (elastic#30973)
  Removing erroneous repeat
  Adapt bwc versions after backporting elastic#30983 to 6.4
  [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient
  TEST:  Retry synced-flush if ongoing ops on primary (elastic#30978)
  Fix docs build.
  Only auto-update license signature if all nodes ready (elastic#30859)
  Add BlobContainer.writeBlobAtomic() (elastic#30902)
  Add a doc value format to binary fields. (elastic#30860)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants