Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] PainlessDomainSplitIT fails after multibucket #32966

Closed
benwtrent opened this issue Aug 17, 2018 · 5 comments · Fixed by #55349
Closed

[CI] PainlessDomainSplitIT fails after multibucket #32966

benwtrent opened this issue Aug 17, 2018 · 5 comments · Fixed by #55349
Assignees
Labels
:ml Machine learning >test-failure Triaged test failures from CI

Comments

@benwtrent
Copy link
Member

benwtrent commented Aug 17, 2018

To recreate

./gradlew :x-pack:qa:ml-single-node-tests:integTestRunner -Dtests.class=org.elasticsearch.xpack.ml.transforms.PainlessDomainSplitIT -Dtests.method="testHRDSplit"

Update On master after the restructure the reproduce command is now

./gradlew :x-pack:plugin:ml:qa:single-node-tests:integTestRunner -Dtests.class=org.elasticsearch.xpack.ml.transforms.PainlessDomainSplitIT -Dtests.method="testHRDSplit"

See failing build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request/15432/consoleFull

@benwtrent
Copy link
Member Author

benwtrent commented Aug 17, 2018

In my investigation some strangeness arises when gathering records.

The test assumes that non-anomalous data is returned when gathering the record information. However, only the anomalous bar.com entries are ever returned.

See

// domainSplit() tests had subdomain, testHighestRegisteredDomainCases() do not

A simple way to fix this would be to just make test.hostname slightly anomalous so that it is returned in the records call. Unsure if this defeats the purpose of the test however.

jasontedor added a commit to jasontedor/elasticsearch that referenced this issue Aug 18, 2018
* master:
  NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (elastic#32764)
  [DOCS] Splits the users API documentation into multiple pages (elastic#32825)
  [DOCS] Splits the token APIs into separate pages (elastic#32865)
  [DOCS] Creates redirects for role management APIs page
  Bypassing failing test PainlessDomainSplitIT#testHRDSplit (elastic#32966)
  TEST: Mute testRetentionPolicyChangeDuringRecovery
  [DOCS] Fixes more broken links to role management APIs
  [Docs] Tweaks and fixes to rollup docs
  [DOCS] Fixes links to role management APIs
  [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature
  [DOCS] Splits the roles API documentation into multiple pages (elastic#32794)
  [TEST]  Run pre 6.4 nodes in non-FIPS JVMs (elastic#32901)
  Make Geo Context Mapping Parsing More Strict (elastic#32821)
jasontedor added a commit that referenced this issue Aug 18, 2018
* elastic/master: (46 commits)
  NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (#32764)
  [DOCS] Splits the users API documentation into multiple pages (#32825)
  [DOCS] Splits the token APIs into separate pages (#32865)
  [DOCS] Creates redirects for role management APIs page
  Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966)
  TEST: Mute testRetentionPolicyChangeDuringRecovery
  [DOCS] Fixes more broken links to role management APIs
  [Docs] Tweaks and fixes to rollup docs
  [DOCS] Fixes links to role management APIs
  [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature
  [DOCS] Splits the roles API documentation into multiple pages (#32794)
  [TEST]  Run pre 6.4 nodes in non-FIPS JVMs (#32901)
  Make Geo Context Mapping Parsing More Strict (#32821)
  [ML] fix updating opened jobs scheduled events (#31651) (#32881)
  Scripted metric aggregations: add deprecation warning and system property to control legacy params (#31597)
  Tests: Fix timezone conversion in DateTimeUnitTests
  Enable FIPS140LicenseBootstrapCheck (#32903)
  Fix InternalAutoDateHistogram reproducible failure (#32723)
  Remove assertion in testDocStats on deletedDocs counter (#32914)
  HLRC: Move ML request converters into their own class (#32906)
  ...
jasontedor added a commit that referenced this issue Aug 18, 2018
* 6.x: (42 commits)
  [DOCS] Splits the users API documentation into multiple pages (#32825)
  [DOCS] Splits the token APIs into separate pages (#32865)
  [DOCS] Creates redirects for role management APIs page
  Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966)
  TEST: Mute testRetentionPolicyChangeDuringRecovery
  [DOCS] Fixes more broken links to role management APIs
  [Docs] Tweaks and fixes to rollup docs
  [DOCS] Fixes links to role management APIs
  [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature
  [DOCS] Splits the roles API documentation into multiple pages (#32794)
  [TEST]  Run pre 6.4 nodes in non-FIPS JVMs (#32901)
  Remove assertion in testDocStats on deletedDocs counter (#32914)
  [ML] fix updating opened jobs scheduled events (#31651) (#32881)
  Enable FIPS140LicenseBootstrapCheck (#32903)
  HLRC: Move ML request converters into their own class (#32906)
  [DOCS] Update getting-started.asciidoc (#29518)
  Fix allowed value for HighlighterBuilder encoder in javadocs (#32780)
  [DOCS] Add "remove a tag" script logic as an example (#32556)
  RFC: Test that example plugins build stand-alone (#32235)
  Security: remove put privilege API (#32879)
  ...
@dimitris-athanasiou dimitris-athanasiou self-assigned this Aug 20, 2018
@dimitris-athanasiou
Copy link
Contributor

This test is doing count by domain_split with a bucket_span of 1h. The data spans 100 buckets. It indexes 1 doc per bucket with a certain domain. Then on the 65th bucket, that domain is not in. Instead, there are 100 docs of a different domain. The test expects 2 anomalies: one for the new domain that has a count of 100, and one for the other domain which has a count of zero when a count of one is expected.

After the multibucket feature was merged, we do no longer see the anomaly for the missing domain. This definitely seems odd. @tveasey What are your thoughts on this one?

@benwtrent
Copy link
Member Author

Doing some experiments, if the number of documents is 2 per bucket, then we do get both anomalies. Though the missing domain probability is extremely small.

{
   "job_id":"hrd-split-job",
   "result_type":"record",
   "probability":3.6470168589396805E-6,
   "record_score":60.098482474177416,
   "initial_record_score":60.098482474177416,
   "bucket_span":3600,
   "detector_index":0,
   "is_interim":false,
   "timestamp":1503460800000,
   "by_field_name":"domain_split",
   "by_field_value":"kerberos.http.192.168,62.222",
   "function":"count",
   "function_description":"count",
   "typical":[1.9850021750253195],
   "actual":[0.0],
   "domain_split":["kerberos.http.192.168,62.222"]
}

@tveasey
Copy link
Contributor

tveasey commented Aug 24, 2018

Agreed this does seem odd. I think this warrants investigation as to what is happening inside the model. I suspect based on Ben's last comment that this might be a numerical issue: something underflowing and not being handled properly (probably when taking a log). I'll look into this.

@davidkyle
Copy link
Member

This test still fails. Is it a problem with the test or a blocker for 6.5?

I updated the reproduce command above

jaymode added a commit that referenced this issue Oct 22, 2018
All of the tests in PainlessDomainSplitIT have an awaitsfix, which
causes the build to fail since no tests are run. This adds an empty
test to get the build going again.

Relates #34683
Relates #32966
jaymode added a commit that referenced this issue Oct 22, 2018
All of the tests in PainlessDomainSplitIT have an awaitsfix, which
causes the build to fail since no tests are run. This adds an empty
test to get the build going again.

Relates #34683
Relates #32966
kcm pushed a commit that referenced this issue Oct 30, 2018
All of the tests in PainlessDomainSplitIT have an awaitsfix, which
causes the build to fail since no tests are run. This adds an empty
test to get the build going again.

Relates #34683
Relates #32966
benwtrent added a commit that referenced this issue Apr 17, 2020
This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). 

The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. 

closes #32966
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Apr 17, 2020
This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). 

The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. 

closes elastic#32966
benwtrent added a commit that referenced this issue Apr 17, 2020
This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). 

The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. 

closes #32966
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >test-failure Triaged test failures from CI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants