Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT multifields #9

Closed
wants to merge 92 commits into from
Closed

Conversation

pgomulka
Copy link
Owner

@pgomulka pgomulka commented Dec 2, 2019

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against master? Unless there is a good reason otherwise, we prefer pull requests against master and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.

ezimuel and others added 30 commits November 21, 2019 22:59
…elastic#49270)

* [ML][Inference] Fixing pre-processor value handling and size estimate

* fixing npe
)

If some replica is performing a file-based recovery, then the check 
assertNoSnapshottedIndexCommit would fail. We should increase the
timeout for this check so that we can wait until all recoveries done
or aborted.

Closes elastic#49403
Part of elastic#49406. Sometimes restarting systemd-journald fails, so detect
this and attempt to log more information.
Add extra checks to prevent ConstantFolding rule to try to fold
the CASE/IIF functions early before the SimplifyCase rule gets applied.

Fixes: elastic#49387
Just realized we were missing some annotations here which was somewhat
confusing since other methods/parameters have the `Nullable` annotation
wherever a `null` can be passed.
Reformats the edge n-gram and n-gram token filter docs. Changes include:

* Adds title abbreviations
* Updates the descriptions and adds Lucene links
* Reformats parameter definitions
* Adds analyze and custom analyzer snippets
* Adds notes explaining differences between the edge n-gram and n-gram
  filters

Additional changes:
* Switches titles to use "n-gram" throughout.
* Fixes a typo in the edge n-gram tokenizer docs
* Adds an explicit anchor for the `index.max_ngram_diff` setting
The default merge cumulator used in netty transport leads to additional
GC pressure and memory copying when a message that exceeds the chunk
size is handled. This is especially a problem on G1 GC, since we get
many "humongous" allocations and that can in theory cause real memory
circuit breaker to break unnecessarily.
The test task is configured to use the runtime java version, but there
are issues with the version of groovy used by gradle pre 6.0. In order
to workaround this, we use the Gradle JDK to execute the build-tools
tests.

Closes elastic#49404
Closes elastic#49253
This commit replaces the _estimate_memory_usage API with
a new API, the _explain API.

The API consolidates information that is useful before
creating a data frame analytics job.

It includes:

- memory estimation
- field selection explanation

Memory estimation is moved here from what was previously
calculated in the _estimate_memory_usage API.

Field selection is a new feature that explains to the user
whether each available field was selected to be included or
not in the analysis. In the case it was not included, it also
explains the reason why.
The problem reported in elastic#44566 should be fixed by the change
that was made in elastic#49367, so the muted test can be unmuted.

Closes elastic#44566
The NodeTests class contains tests that check behavior when shutting
down a node. This involves starting a node, performing some operation,
stopping the node, and then awaiting the close of the node. Part of
closing a node is the termination of the node's ThreadPool. ThreadPool
termination semantics can be deceiving. The ThreadPool#terminate method
takes a timeout value and the first oddity is that the terminate method
can take two times the timeout value before returning. Internally this
method acts on the ExecutorService instances that are held by the
ThreadPool. First, an orderly shutdown is attempted and pending tasks
are allowed to execute while waiting for the timeout value. If any of
the ExecutorService instances have not terminated, a call is made to
attempt to stop all active tasks (usually using interrupts) and then
waits for up to the timeout value a second time for the termination of
the ExecutorService instances. This means that if use a large value
when waiting for a node to close, we may not attempt to interrupt any
threads that are in a blocking call before the test times out.

In order to avoid causing these tests to time out, this change reduces
the timeout passed to Node#awaitClose to 10 seconds from 1 day. This
will allow blocked threads to be interrupted before the test suite
fails due to the timeout.

Closes elastic#44256
Closes elastic#42350
Closes elastic#44435
This commit enhances the required pipeline functionality by changing it
so that default/request pipelines can also be executed, but the required
pipeline is always executed last. This gives users the flexibility to
execute their own indexing pipelines, but also ensure that any required
pipelines are also executed. Since such pipelines are executed last, we
change the name of required pipelines to final pipelines.
This commit adjusts the version final pipeline serialization after it
was backported to the 7.x branch which is currently versioned 7.6.0.
This is related to elastic#49067. This commit adds the simple connection
strategy settings and strategy mode setting to the cluster settings
registry. With these changes, the simple connection mode can be used.
Additionally, it adds validation to ensure that settings cannot be
misconfigured.
This commit adjusts the version final pipeline serialization after it
was backported to the 7.5 branch.
…stic#49346)

Previously, request and response objects related to index creation and mappings
were used in both the transport layer and HLRC. Now that they are no longer
shared, we can remove the extra xContent serialization + deserialization logic.
Adds support for proper cancel tasks parsing.

Closes elastic#45414
Adds a missing float tag to the edge n-gram tokenizer docs. This tag
ensures the edge n-gram tokenizer docs display on the same page.
Add a mirror of the maven repository of the shibboleth project 
and upgrade opensaml and related dependencies to the latest
version available version

Resolves: elastic#44947
This commit ensures that even for requests that are known to be empty body
we at least attempt to read one bytes from the request body input stream.
This is done to work around the behavior in `sun.net.httpserver.ServerImpl.Dispatcher#handleEvent`
that will close a TCP/HTTP connection that does not have the `eof` flag (see `sun.net.httpserver.LeftOverInputStream#isEOF`)
set on its input stream. As far as I can tell the only way to set this flag is to do a read when there's no more bytes buffered.
This fixes the numerous connection closing issues because the `ServerImpl` stops closing connections that it thinks
weren't fully drained.

Also, I removed a now redundant drain loop in the Azure handler as well as removed the connection closing in the error handler's
drain action (this shouldn't have an effect but makes things more predictable/easier to reason about IMO).

I would suggest merging this and closing related issue after verifying that this fixes things on CI.

The way to locally reproduce the issues we're seeing in tests is to make the retry timings more aggressive in e.g. the azure tests
and move them to single digit values. This makes the retries happen quickly enough that they run into the async connecting closing
of allegedly non-eof connections by `ServerImpl` and produces the exact kinds of failures we're seeing currently.

Relates elastic#49401, elastic#49429
add debug log for transform creation and disallow partial results for retrieval
tvernum and others added 27 commits November 27, 2019 13:02
Authentication has grown more complex with the addition of new realm
types and authentication methods. When user authentication does not
behave as expected it can be difficult to determine where and why it
failed.

This commit adds DEBUG and TRACE logging at key points in the
authentication flow so that it is possible to gain addition insight
into the operation of the system.

Relates: elastic#49473
Adds support for templating to `field` and `target_field` options.
Fixes a bug where a scripted upsert that causes a dynamic mapping update is retried (because
mapping update is still in-flight), and the request is mutated multiple times.

Closes elastic#48670
When using unpooled, the allocator is wrapped twice in a NoDirectBuffers.
Fix reference about the uid:gid that Elasticsearch runs as inside
the Docker container and add a packaging test to ensure that bind
mounting a data dir with a random uid and gid:0 works as
expected.

Relates elastic#49529
Closes elastic#47929
Updates randomizedrunner from 2.7.1 to 2.7.4, which includes some fixes
related to race conditions/deadlocks.
…lastic#48815)

Type filters and intermediate type levels in mappings responses have already been
removed from the GetFieldMappings REST layer; we can also remove them from the
internal Node client classes.

Relates to elastic#41059
…stic#49166)

This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when elastic#26472
is implemented.

Closes elastic#43599
This commit adds templating support to the pipeline processor's `name` option.

Closes elastic#39955
)

This PR fixes a trivial typo error that affects assigning null_value in the GeoPointFieldMapper
This commit clarifies how to override JAVA_HOME from the bundled jdk for
deb and rpm installs, which each have their own file that is sourced
upon service startup.

closes elastic#49068
This commit removes outdated documentation about a path setting for file
scripts which no longer exist.

closes elastic#45827
This commits sets an output marker file for the docker build tasks so
that it can be tracked as up to date. It also fixes the docker build
context task to omit the build date as in input property which always
left the task as out of date.

relates elastic#49359
The test was slightly modified with elastic#49166, the two test documents in
`testNormalization` look like they should mirror the document id in the "id"
field in order for it to work as a tie breaker.

Closes elastic#49654
This commit moves the packaging tests for elasticsearch-setup-passwords
to java from bats. The change also enables future tests to enable
security in Elasticsearch and automatically have waitForElasticsearch
work correctly, at least to the same extent it worked in bats, by
waiting on the ES port instead of health check.

relates elastic#46005
SecurityIT.testGetUser creates a user for testing purposes, but did
not delete the user at the end of the test. This could leave the
cluster in an unexpected state for other tests.

This commit:
- Deletes the user at the end of `testGetUser`
- Adds the test-name as metadata to the users that are created in `SecurityIT`
  so that their origin is clear if they do interfere with other tests
- Enables SecurityDocumentationIT.testGetUsers on the expectation that
  the new cleanup step will resolve the unreliability of that test.

Relates: elastic#48440
This stems from a time where index requests were directly forwarded to
TransportReplicationAction. Nowadays they are wrapped in a BulkShardRequest, and this logic is
obsolete.

Closes elastic#20279
…lastic#48580)

This commit adds  a new histogram field mapper that consists in a pre-aggregated format of numerical data to be used in percentiles aggregations.
This test no longer relies on jdk version, so the assume should be removed
relates elastic#48209
@pgomulka
Copy link
Owner Author

pgomulka commented Dec 2, 2019

@elasticmachine run elasticsearch-ci/1

@pgomulka pgomulka closed this Dec 2, 2019
pgomulka pushed a commit that referenced this pull request Sep 6, 2023
Fixes elastic/elasticsearch-internal#497
Fixes ESQL-560

A query like `from test | sort data | limit 2 | project count` fails
because `LocalToGlobalLimitAndTopNExec` planning rule adds a collecting
`TopNExec` after last GATHER exchange, to perform last reduce, see

```
TopNExec[[Order[data{f}#6,ASC,LAST]],2[INTEGER]]
\_ExchangeExec[GATHER,SINGLE_DISTRIBUTION]
  \_ProjectExec[[count{f}#4]]      // <- `data` is projected away but still used by the TopN node above
    \_FieldExtractExec[count{f}#4]
      \_TopNExec[[Order[data{f}#6,ASC,LAST]],2[INTEGER]]
        \_FieldExtractExec[data{f}#6]
          \_ExchangeExec[REPARTITION,FIXED_ARBITRARY_DISTRIBUTION]
            \_EsQueryExec[test], query[][_doc_id{f}#9, _segment_id{f}#10, _shard_id{f}#11]
```

Unfortunately, at that stage the inputs needed by the TopNExec could
have been projected away by a ProjectExec, so they could be no longer
available.

This PR adapts the plan as follows:
- add all the projections used by the `TopNExec` to the existing
`ProjectExec`, so that they are available when needed
- add another ProjectExec on top of the plan, to project away the
originally removed projections and preserve the query semantics


This approach is a bit dangerous, because it bypasses the mechanism of
input/output resolution and validation that happens on the logical plan.
The alternative would be to do this manipulation on the logical plan,
but it's probably hard to do, because there is no concept of Exchange at
that level.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.