Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dropping documents with auto-generated ID #46773

Merged
merged 7 commits into from
Sep 19, 2019

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Sep 17, 2019

When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat as well) + coordinating nodes that do not have the ingest processor functionality, this can lead to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when the request contains auto-generated IDs. The response serialization is lenient for our REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for communication between nodes) assumes for this field to be non-null, which means that it can't be serialized between nodes. Bulk requests with ingest functionality are processed on the coordinating node if the node has the ingest capability, and only otherwise sent to a different node. This means that, in order to reproduce this, one needs two nodes, with the coordinating node not having the ingest functionality.

Closes #46678

@ywelsch ywelsch added >bug :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.0.0 v7.5.0 v7.4.1 labels Sep 17, 2019
@ywelsch ywelsch requested a review from dnhatn September 17, 2019 08:58
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just needs some test adjustments.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timing on the tests comment :) LGTM

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one minor comment. I think this should also be backported to the 6.8 branch?

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. LGTM.

@ywelsch ywelsch merged commit 08e3ceb into elastic:master Sep 19, 2019
@rtkgjacobs
Copy link

Thanks all for fixing this, cheers!

ywelsch added a commit that referenced this pull request Sep 19, 2019
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
ywelsch added a commit that referenced this pull request Sep 19, 2019
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
@colings86 colings86 added v7.4.0 :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP and removed v7.4.1 :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Sep 20, 2019
@ywelsch ywelsch added the v6.8.4 label Sep 20, 2019
ywelsch added a commit that referenced this pull request Sep 20, 2019
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.

The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.

Closes #46678
ywelsch added a commit that referenced this pull request Sep 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v6.8.4 v7.4.0 v7.5.0 v8.0.0-alpha1
Projects
None yet
8 participants