[CI] PainlessDomainSplitIT fails after multibucket #32966

benwtrent · 2018-08-17T20:21:09Z

To recreate

./gradlew :x-pack:qa:ml-single-node-tests:integTestRunner -Dtests.class=org.elasticsearch.xpack.ml.transforms.PainlessDomainSplitIT -Dtests.method="testHRDSplit"

Update On master after the restructure the reproduce command is now

./gradlew :x-pack:plugin:ml:qa:single-node-tests:integTestRunner -Dtests.class=org.elasticsearch.xpack.ml.transforms.PainlessDomainSplitIT -Dtests.method="testHRDSplit"

See failing build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request/15432/consoleFull

The text was updated successfully, but these errors were encountered:

benwtrent · 2018-08-17T20:34:14Z

In my investigation some strangeness arises when gathering records.

The test assumes that non-anomalous data is returned when gathering the record information. However, only the anomalous bar.com entries are ever returned.

See

elasticsearch/x-pack/qa/ml-single-node-tests/src/test/java/org/elasticsearch/xpack/ml/transforms/PainlessDomainSplitIT.java

Line 231 in 647705e

// domainSplit() tests had subdomain, testHighestRegisteredDomainCases() do not

A simple way to fix this would be to just make test.hostname slightly anomalous so that it is returned in the records call. Unsure if this defeats the purpose of the test however.

* master: NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (elastic#32764) [DOCS] Splits the users API documentation into multiple pages (elastic#32825) [DOCS] Splits the token APIs into separate pages (elastic#32865) [DOCS] Creates redirects for role management APIs page Bypassing failing test PainlessDomainSplitIT#testHRDSplit (elastic#32966) TEST: Mute testRetentionPolicyChangeDuringRecovery [DOCS] Fixes more broken links to role management APIs [Docs] Tweaks and fixes to rollup docs [DOCS] Fixes links to role management APIs [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature [DOCS] Splits the roles API documentation into multiple pages (elastic#32794) [TEST] Run pre 6.4 nodes in non-FIPS JVMs (elastic#32901) Make Geo Context Mapping Parsing More Strict (elastic#32821)

* elastic/master: (46 commits) NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (#32764) [DOCS] Splits the users API documentation into multiple pages (#32825) [DOCS] Splits the token APIs into separate pages (#32865) [DOCS] Creates redirects for role management APIs page Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966) TEST: Mute testRetentionPolicyChangeDuringRecovery [DOCS] Fixes more broken links to role management APIs [Docs] Tweaks and fixes to rollup docs [DOCS] Fixes links to role management APIs [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature [DOCS] Splits the roles API documentation into multiple pages (#32794) [TEST] Run pre 6.4 nodes in non-FIPS JVMs (#32901) Make Geo Context Mapping Parsing More Strict (#32821) [ML] fix updating opened jobs scheduled events (#31651) (#32881) Scripted metric aggregations: add deprecation warning and system property to control legacy params (#31597) Tests: Fix timezone conversion in DateTimeUnitTests Enable FIPS140LicenseBootstrapCheck (#32903) Fix InternalAutoDateHistogram reproducible failure (#32723) Remove assertion in testDocStats on deletedDocs counter (#32914) HLRC: Move ML request converters into their own class (#32906) ...

* 6.x: (42 commits) [DOCS] Splits the users API documentation into multiple pages (#32825) [DOCS] Splits the token APIs into separate pages (#32865) [DOCS] Creates redirects for role management APIs page Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966) TEST: Mute testRetentionPolicyChangeDuringRecovery [DOCS] Fixes more broken links to role management APIs [Docs] Tweaks and fixes to rollup docs [DOCS] Fixes links to role management APIs [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature [DOCS] Splits the roles API documentation into multiple pages (#32794) [TEST] Run pre 6.4 nodes in non-FIPS JVMs (#32901) Remove assertion in testDocStats on deletedDocs counter (#32914) [ML] fix updating opened jobs scheduled events (#31651) (#32881) Enable FIPS140LicenseBootstrapCheck (#32903) HLRC: Move ML request converters into their own class (#32906) [DOCS] Update getting-started.asciidoc (#29518) Fix allowed value for HighlighterBuilder encoder in javadocs (#32780) [DOCS] Add "remove a tag" script logic as an example (#32556) RFC: Test that example plugins build stand-alone (#32235) Security: remove put privilege API (#32879) ...

dimitris-athanasiou · 2018-08-20T10:59:54Z

This test is doing count by domain_split with a bucket_span of 1h. The data spans 100 buckets. It indexes 1 doc per bucket with a certain domain. Then on the 65th bucket, that domain is not in. Instead, there are 100 docs of a different domain. The test expects 2 anomalies: one for the new domain that has a count of 100, and one for the other domain which has a count of zero when a count of one is expected.

After the multibucket feature was merged, we do no longer see the anomaly for the missing domain. This definitely seems odd. @tveasey What are your thoughts on this one?

benwtrent · 2018-08-20T12:06:16Z

Doing some experiments, if the number of documents is 2 per bucket, then we do get both anomalies. Though the missing domain probability is extremely small.

{
   "job_id":"hrd-split-job",
   "result_type":"record",
   "probability":3.6470168589396805E-6,
   "record_score":60.098482474177416,
   "initial_record_score":60.098482474177416,
   "bucket_span":3600,
   "detector_index":0,
   "is_interim":false,
   "timestamp":1503460800000,
   "by_field_name":"domain_split",
   "by_field_value":"kerberos.http.192.168,62.222",
   "function":"count",
   "function_description":"count",
   "typical":[1.9850021750253195],
   "actual":[0.0],
   "domain_split":["kerberos.http.192.168,62.222"]
}

tveasey · 2018-08-24T09:16:37Z

Agreed this does seem odd. I think this warrants investigation as to what is happening inside the model. I suspect based on Ben's last comment that this might be a numerical issue: something underflowing and not being handled properly (probably when taking a log). I'll look into this.

davidkyle · 2018-10-19T11:51:57Z

This test still fails. Is it a problem with the test or a blocker for 6.5?

I updated the reproduce command above

All of the tests in PainlessDomainSplitIT have an awaitsfix, which causes the build to fail since no tests are run. This adds an empty test to get the build going again. Relates #34683 Relates #32966

This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. closes #32966

This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. closes elastic#32966

This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :). The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet. closes #32966

benwtrent added >test-failure Triaged test failures from CI :ml Machine learning labels Aug 17, 2018

benwtrent added a commit that referenced this issue Aug 17, 2018

Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966)

f559301

benwtrent added a commit that referenced this issue Aug 17, 2018

Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966)

647705e

dimitris-athanasiou self-assigned this Aug 20, 2018

benwtrent assigned benwtrent and unassigned dimitris-athanasiou Apr 16, 2020

benwtrent mentioned this issue Apr 16, 2020

[ML] fixing and unmuting testHRDSplit test #55349

Merged

benwtrent closed this as completed in #55349 Apr 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] PainlessDomainSplitIT fails after multibucket #32966

[CI] PainlessDomainSplitIT fails after multibucket #32966

benwtrent commented Aug 17, 2018 •

edited by davidkyle

Loading

benwtrent commented Aug 17, 2018 •

edited

Loading

dimitris-athanasiou commented Aug 20, 2018

benwtrent commented Aug 20, 2018

tveasey commented Aug 24, 2018 •

edited

Loading

davidkyle commented Oct 19, 2018

[CI] PainlessDomainSplitIT fails after multibucket #32966

[CI] PainlessDomainSplitIT fails after multibucket #32966

Comments

benwtrent commented Aug 17, 2018 • edited by davidkyle Loading

benwtrent commented Aug 17, 2018 • edited Loading

dimitris-athanasiou commented Aug 20, 2018

benwtrent commented Aug 20, 2018

tveasey commented Aug 24, 2018 • edited Loading

davidkyle commented Oct 19, 2018

benwtrent commented Aug 17, 2018 •

edited by davidkyle

Loading

benwtrent commented Aug 17, 2018 •

edited

Loading

tveasey commented Aug 24, 2018 •

edited

Loading