Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving status code 500 on pipeline failures for bulk ingestion requests #48803

Closed
simitt opened this issue Nov 1, 2019 · 2 comments · Fixed by #48810
Closed

Receiving status code 500 on pipeline failures for bulk ingestion requests #48803

simitt opened this issue Nov 1, 2019 · 2 comments · Fixed by #48810
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP

Comments

@simitt
Copy link
Contributor

simitt commented Nov 1, 2019

Elasticsearch version (bin/elasticsearch --version): 7.4.0 (also confirmed with 7.0.0)

Plugins installed: []

JVM version (java -version): openjdk 13

OS version (uname -a if on a Unix-like system): mac

Description of the problem including expected versus actual behavior:
Elasticsearch returns status code 500 for events sent to the bulk API when an applied pipeline fails processing an event, and has no failure handling (ignore_failure or on_failure) defined for the pipeline.
I would except a status code 4xx as the processing fails based on invalid data, and not based on a server issue.

Steps to reproduce:
(0) Create some pipeline, with no failure handling, in this example the geo-ip pipeline:

PUT _ingest/pipeline/sample
{
  "processors" : [
    {
      "geoip" : {
        "database_file" : "GeoLite2-City.mmdb",
        "field" : "client.ip",
        "target_field" : "client.geo",
        "ignore_missing" : true
      }
    }
  ]
}

(1) Ingest data using the bulk API and applying the pipeline

POST /_bulk
{ "index" : { "_index" : "my-index", "pipeline" : "sample" } }
{ "client" : { "ip": "unknown"} }

The ingestion fails, as the registered pipeline expects the client.ip to be an IP address, but instead a random string is sent.

Response:

{
  "took" : 0,
  "ingest_took" : 50,
  "errors" : true,
  "items" : [
    {
      "index" : {
        "_index" : "my-index",
        "_type" : "_doc",
        "_id" : null,
        "status" : 500,
        "error" : {
          "type" : "exception",
          "reason" : "java.lang.IllegalArgumentException: java.lang.IllegalArgumentException: 'unknown' is not an IP string literal.",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "java.lang.IllegalArgumentException: 'unknown' is not an IP string literal.",
            "caused_by" : {
              "type" : "illegal_argument_exception",
              "reason" : "'unknown' is not an IP string literal."
            }
          },
          "header" : {
            "processor_type" : "geoip"
          }
        }
      }
    }
  ]
}
@jasontedor
Copy link
Member

Ultimately the problem here is broader than ingest pipelines, it relates to the fact that we are not unwrapping causes for ElasticsearchExceptions:

public static Throwable unwrapCause(Throwable t) {
int counter = 0;
Throwable result = t;
while (result instanceof ElasticsearchWrapperException) {
if (result.getCause() == null) {
return result;
}
if (result.getCause() == result) {
return result;
}
if (counter++ > 10) {
// dear god, if we got more than 10 levels down, WTF? just bail
logger.warn("Exception cause unwrapping ran for 10 levels...", t);
return result;
}
result = result.getCause();
}
return result;
}

Because of this, when reporting the status of a document-level failure while executing a bulk request, we fall back to the default status of an ElasticsearchException (that wraps a failure in an ingest pipeline) which is 500.

@javanna javanna added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Nov 4, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants