DRAFT multifields #9

pgomulka · 2019-12-02T16:19:10Z

Have you signed the contributor license agreement?
Have you followed the contributor guidelines?
If submitting code, have you built your formula locally prior to submission with gradle check?
If submitting code, is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed.
If submitting code, have you checked that your submission is for an OS and architecture that we support?
If you are submitting this code for a class then read our policy for that.

…tic#49442) Co-Authored-By: James Rodewig <[email protected]>

…elastic#49270) * [ML][Inference] Fixing pre-processor value handling and size estimate * fixing npe

) If some replica is performing a file-based recovery, then the check assertNoSnapshottedIndexCommit would fail. We should increase the timeout for this check so that we can wait until all recoveries done or aborted. Closes elastic#49403

Part of elastic#49406. Sometimes restarting systemd-journald fails, so detect this and attempt to log more information.

Add extra checks to prevent ConstantFolding rule to try to fold the CASE/IIF functions early before the SimplifyCase rule gets applied. Fixes: elastic#49387

Just realized we were missing some annotations here which was somewhat confusing since other methods/parameters have the `Nullable` annotation wherever a `null` can be passed.

Reformats the edge n-gram and n-gram token filter docs. Changes include: * Adds title abbreviations * Updates the descriptions and adds Lucene links * Reformats parameter definitions * Adds analyze and custom analyzer snippets * Adds notes explaining differences between the edge n-gram and n-gram filters Additional changes: * Switches titles to use "n-gram" throughout. * Fixes a typo in the edge n-gram tokenizer docs * Adds an explicit anchor for the `index.max_ngram_diff` setting

The default merge cumulator used in netty transport leads to additional GC pressure and memory copying when a message that exceeds the chunk size is handled. This is especially a problem on G1 GC, since we get many "humongous" allocations and that can in theory cause real memory circuit breaker to break unnecessarily.

The test task is configured to use the runtime java version, but there are issues with the version of groovy used by gradle pre 6.0. In order to workaround this, we use the Gradle JDK to execute the build-tools tests. Closes elastic#49404 Closes elastic#49253

This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why.

The problem reported in elastic#44566 should be fixed by the change that was made in elastic#49367, so the muted test can be unmuted. Closes elastic#44566

The NodeTests class contains tests that check behavior when shutting down a node. This involves starting a node, performing some operation, stopping the node, and then awaiting the close of the node. Part of closing a node is the termination of the node's ThreadPool. ThreadPool termination semantics can be deceiving. The ThreadPool#terminate method takes a timeout value and the first oddity is that the terminate method can take two times the timeout value before returning. Internally this method acts on the ExecutorService instances that are held by the ThreadPool. First, an orderly shutdown is attempted and pending tasks are allowed to execute while waiting for the timeout value. If any of the ExecutorService instances have not terminated, a call is made to attempt to stop all active tasks (usually using interrupts) and then waits for up to the timeout value a second time for the termination of the ExecutorService instances. This means that if use a large value when waiting for a node to close, we may not attempt to interrupt any threads that are in a blocking call before the test times out. In order to avoid causing these tests to time out, this change reduces the timeout passed to Node#awaitClose to 10 seconds from 1 day. This will allow blocked threads to be interrupted before the test suite fails due to the timeout. Closes elastic#44256 Closes elastic#42350 Closes elastic#44435

This commit enhances the required pipeline functionality by changing it so that default/request pipelines can also be executed, but the required pipeline is always executed last. This gives users the flexibility to execute their own indexing pipelines, but also ensure that any required pipelines are also executed. Since such pipelines are executed last, we change the name of required pipelines to final pipelines.

This commit adjusts the version final pipeline serialization after it was backported to the 7.x branch which is currently versioned 7.6.0.

This is related to elastic#49067. This commit adds the simple connection strategy settings and strategy mode setting to the cluster settings registry. With these changes, the simple connection mode can be used. Additionally, it adds validation to ensure that settings cannot be misconfigured.

This commit adjusts the version final pipeline serialization after it was backported to the 7.5 branch.

…stic#49346) Previously, request and response objects related to index creation and mappings were used in both the transport layer and HLRC. Now that they are no longer shared, we can remove the extra xContent serialization + deserialization logic.

Adds support for proper cancel tasks parsing. Closes elastic#45414

Adds a missing float tag to the edge n-gram tokenizer docs. This tag ensures the edge n-gram tokenizer docs display on the same page.

Add a mirror of the maven repository of the shibboleth project and upgrade opensaml and related dependencies to the latest version available version Resolves: elastic#44947

This commit ensures that even for requests that are known to be empty body we at least attempt to read one bytes from the request body input stream. This is done to work around the behavior in `sun.net.httpserver.ServerImpl.Dispatcher#handleEvent` that will close a TCP/HTTP connection that does not have the `eof` flag (see `sun.net.httpserver.LeftOverInputStream#isEOF`) set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered. This fixes the numerous connection closing issues because the `ServerImpl` stops closing connections that it thinks weren't fully drained. Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO). I would suggest merging this and closing related issue after verifying that this fixes things on CI. The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing of allegedly non-eof connections by `ServerImpl` and produces the exact kinds of failures we're seeing currently. Relates elastic#49401, elastic#49429

add debug log for transform creation and disallow partial results for retrieval

Authentication has grown more complex with the addition of new realm types and authentication methods. When user authentication does not behave as expected it can be difficult to determine where and why it failed. This commit adds DEBUG and TRACE logging at key points in the authentication flow so that it is possible to gain addition insight into the operation of the system. Relates: elastic#49473

Adds support for templating to `field` and `target_field` options.

Fixes a bug where a scripted upsert that causes a dynamic mapping update is retried (because mapping update is still in-flight), and the request is mutated multiple times. Closes elastic#48670

When using unpooled, the allocator is wrapped twice in a NoDirectBuffers.

Fix reference about the uid:gid that Elasticsearch runs as inside the Docker container and add a packaging test to ensure that bind mounting a data dir with a random uid and gid:0 works as expected. Relates elastic#49529 Closes elastic#47929

Updates randomizedrunner from 2.7.1 to 2.7.4, which includes some fixes related to race conditions/deadlocks.

…lastic#48815) Type filters and intermediate type levels in mappings responses have already been removed from the GetFieldMappings REST layer; we can also remove them from the internal Node client classes. Relates to elastic#41059

…stic#49166) This change adds a dynamic cluster setting named `indices.id_field_data.enabled`. When set to `false` any attempt to load the fielddata for the `_id` field will fail with an exception. The default value in this change is set to `false` in order to prevent fielddata usage on this field for future versions but it will be set to `true` when backporting to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue a deprecation warning since we want to disallow fielddata entirely when elastic#26472 is implemented. Closes elastic#43599

This commit adds templating support to the pipeline processor's `name` option. Closes elastic#39955

Fixes an incorrect request path added with elastic#46634

…stic#48975)

) This PR fixes a trivial typo error that affects assigning null_value in the GeoPointFieldMapper

This commit clarifies how to override JAVA_HOME from the bundled jdk for deb and rpm installs, which each have their own file that is sourced upon service startup. closes elastic#49068

This commit removes outdated documentation about a path setting for file scripts which no longer exist. closes elastic#45827

This commits sets an output marker file for the docker build tasks so that it can be tracked as up to date. It also fixes the docker build context task to omit the build date as in input property which always left the task as out of date. relates elastic#49359

The test was slightly modified with elastic#49166, the two test documents in `testNormalization` look like they should mirror the document id in the "id" field in order for it to work as a tie breaker. Closes elastic#49654

This commit moves the packaging tests for elasticsearch-setup-passwords to java from bats. The change also enables future tests to enable security in Elasticsearch and automatically have waitForElasticsearch work correctly, at least to the same extent it worked in bats, by waiting on the ES port instead of health check. relates elastic#46005

SecurityIT.testGetUser creates a user for testing purposes, but did not delete the user at the end of the test. This could leave the cluster in an unexpected state for other tests. This commit: - Deletes the user at the end of `testGetUser` - Adds the test-name as metadata to the users that are created in `SecurityIT` so that their origin is clear if they do interfere with other tests - Enables SecurityDocumentationIT.testGetUsers on the expectation that the new cleanup step will resolve the unreliability of that test. Relates: elastic#48440

…48557) Closes elastic#48475

This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. Closes elastic#20279

This reverts commit 6cca2b0.

…lastic#48580) This commit adds a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.

This test no longer relies on jdk version, so the assume should be removed relates elastic#48209

pgomulka · 2019-12-02T16:20:18Z

@elasticmachine run elasticsearch-ci/1

Fixes elastic/elasticsearch-internal#497 Fixes ESQL-560 A query like `from test | sort data | limit 2 | project count` fails because `LocalToGlobalLimitAndTopNExec` planning rule adds a collecting `TopNExec` after last GATHER exchange, to perform last reduce, see ``` TopNExec[[Order[data{f}#6,ASC,LAST]],2[INTEGER]] \_ExchangeExec[GATHER,SINGLE_DISTRIBUTION] \_ProjectExec[[count{f}#4]] // <- `data` is projected away but still used by the TopN node above \_FieldExtractExec[count{f}#4] \_TopNExec[[Order[data{f}#6,ASC,LAST]],2[INTEGER]] \_FieldExtractExec[data{f}#6] \_ExchangeExec[REPARTITION,FIXED_ARBITRARY_DISTRIBUTION] \_EsQueryExec[test], query[][_doc_id{f}#9, _segment_id{f}#10, _shard_id{f}#11] ``` Unfortunately, at that stage the inputs needed by the TopNExec could have been projected away by a ProjectExec, so they could be no longer available. This PR adapts the plan as follows: - add all the projections used by the `TopNExec` to the existing `ProjectExec`, so that they are available when needed - add another ProjectExec on top of the plan, to project away the originally removed projections and preserve the query semantics This approach is a bit dangerous, because it bypasses the mechanism of input/output resolution and validation that happens on the logical plan. The alternative would be to do this manipulation on the logical plan, but it's probably hard to do, because there is no concept of Exchange at that level.

ezimuel and others added 30 commits November 21, 2019 22:59

Slash miss in indices.put_mapping url

eeda29a

Remove outdated Todo in CommonAnalysisPlugin (elastic#49450)

249f5a2

[DOCS] Removes the default size definition of thread pool types (elas…

56888ff

…tic#49442) Co-Authored-By: James Rodewig <[email protected]>

elastic#49092 adapt version after backport

e3d10a9

[ML][Inference] Fixing pre-processor value handling and size estimate (…

9360dc9

…elastic#49270) * [ML][Inference] Fixing pre-processor value handling and size estimate * fixing npe

[ML][Inference][HLRC] GET trained models (elastic#49464)

9006926

Merge branch 'fix/backslash-url-indices-put_mapping'

93a9cc7

Add failure logging in package tests (elastic#49486)

644d77c

Part of elastic#49406. Sometimes restarting systemd-journald fails, so detect this and attempt to log more information.

SQL: Fix issue with folding of CASE/IIF (elastic#49449)

f35c972

Add extra checks to prevent ConstantFolding rule to try to fold the CASE/IIF functions early before the SimplifyCase rule gets applied. Fixes: elastic#49387

Add Missing Nullable Assertions in SnapshotsService (elastic#49465)

a59b452

Just realized we were missing some annotations here which was somewhat confusing since other methods/parameters have the `Nullable` annotation wherever a `null` can be passed.

[DOCS] Merge rollup config details into API (elastic#49412)

a4efab6

[ML] Stop timing stats failure propagation (elastic#49495)

197d5e7

[TEST] Unmute testFailOverBasics_withDataFeeder (elastic#49493)

396aeff

The problem reported in elastic#44566 should be fixed by the change that was made in elastic#49367, so the muted test can be unmuted. Closes elastic#44566

Adjust version on final pipeline serialization

51c3db1

This commit adjusts the version final pipeline serialization after it was backported to the 7.x branch which is currently versioned 7.6.0.

Adjust version on final pipeline serialization

1593da0

This commit adjusts the version final pipeline serialization after it was backported to the 7.5 branch.

Fix HLRC parsing of CancelTasks response (elastic#47017)

52003a2

Adds support for proper cancel tasks parsing. Closes elastic#45414

[DOCS] Fix edge n-gram tokenizer nav

642390c

Adds a missing float tag to the edge n-gram tokenizer docs. This tag ensures the edge n-gram tokenizer docs display on the same page.

Update opensaml dependency (elastic#44972)

df760fe

Add a mirror of the maven repository of the shibboleth project and upgrade opensaml and related dependencies to the latest version available version Resolves: elastic#44947

Include id to the error msg when it's too long (elastic#49433)

1140ebf

[Transform] add debug log for configuration index (elastic#49484)

3a2339f

add debug log for transform creation and disallow partial results for retrieval

tvernum and others added 27 commits November 27, 2019 13:02

Add templating support to enrich processor (elastic#49093)

4013e81

Adds support for templating to `field` and `target_field` options.

Do not mutate request on scripted upsert (elastic#49578)

3429f79

Fixes a bug where a scripted upsert that causes a dynamic mapping update is retried (because mapping update is still in-flight), and the request is mutated multiple times. Closes elastic#48670

Avoid double-wrapping allocator (elastic#49534)

403b938

When using unpooled, the allocator is wrapped twice in a NoDirectBuffers.

Update randomizedrunner to 2.7.4 (elastic#49345)

b696c92

Updates randomizedrunner from 2.7.1 to 2.7.4, which includes some fixes related to race conditions/deadlocks.

Add templating support to pipeline processor. (elastic#49030)

88aea21

This commit adds templating support to the pipeline processor's `name` option. Closes elastic#39955

[Docs] Correct typo in log file name (elastic#49620)

815ea92

[DOCS] Correct request path for synced flush API docs (elastic#49631)

a5f2375

Fixes an incorrect request path added with elastic#46634

[DOCS] Correct the request path for flush API docs (elastic#49615)

00cef95

[DOCS] Clarify how to update max memory size in bootstrap checks (ela…

7a7d15b

…stic#48975)

Fix typo when assigning null_value in GeoPointFieldMapper (elastic#49645

4b16d50

) This PR fixes a trivial typo error that affects assigning null_value in the GeoPointFieldMapper

Add JAVA_HOME env override location to docs (elastic#49565)

0042500

This commit clarifies how to override JAVA_HOME from the bundled jdk for deb and rpm installs, which each have their own file that is sourced upon service startup. closes elastic#49068

Remove legacy referene to file scripts (elastic#49339)

6c54b38

This commit removes outdated documentation about a path setting for file scripts which no longer exist. closes elastic#45827

Small fix in ICUCollationKeywordFieldMapperIT (elastic#49656)

c23c4ae

The test was slightly modified with elastic#49166, the two test documents in `testNormalization` look like they should mirror the document id in the "id" field in order for it to work as a tie breaker. Closes elastic#49654

Pure disjunctions should rewrite to a MatchNoneQueryBuilder (elastic#…

08bf0ae

…48557) Closes elastic#48475

Remove obsolete resolving logic from TRA (elastic#49647)

6cca2b0

This stems from a time where index requests were directly forwarded to TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is obsolete. Closes elastic#20279

Remove ClassInfo interface and BinaryClassInfo class. (elastic#49649)

e248610

Revert "Remove obsolete resolving logic from TRA (elastic#49647)"

a354c60

This reverts commit 6cca2b0.

New Histogram field mapper that supports percentiles aggregations. (e…

eade4f0

…lastic#48580) This commit adds a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.

Enable LicenceServiceTests for all jdks (elastic#49440)

7a1720c

This test no longer relies on jdk version, so the assume should be removed relates elastic#48209

draft not working

d8fff72

pgomulka closed this Dec 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRAFT multifields #9

DRAFT multifields #9

pgomulka commented Dec 2, 2019

pgomulka commented Dec 2, 2019

DRAFT multifields #9

DRAFT multifields #9

Conversation

pgomulka commented Dec 2, 2019

pgomulka commented Dec 2, 2019