Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span format is not well suited for ES #906

Closed
Monnoroch opened this issue Jul 3, 2018 · 39 comments
Closed

Span format is not well suited for ES #906

Monnoroch opened this issue Jul 3, 2018 · 39 comments
Assignees

Comments

@Monnoroch
Copy link

Monnoroch commented Jul 3, 2018

Jaeger spans, when put in elasticsearch, have the following structure:

{
  "_index": "jaeger-span-2018-07-02",
  "_type": "span",
  "_id": "a3RsXWQBjGb8h888uVuL",
  "_version": 1,
  "_score": null,
  "_source": {
    "traceID": "f229796cda43df60",
    "spanID": "274c0d9d4fc76848",
    "parentSpanID": "f229796cda43df60",
    "flags": 1,
    "operationName": "/",
    "references": [
      {
        "refType": "CHILD_OF",
        "traceID": "f229796cda43df60",
        "spanID": "f229796cda43df60"
      }
    ],
    "startTime": 1530575763255726,
    "duration": 1798,
    "tags": [
      {
        "key": "component",
        "type": "string",
        "value": "nginx"
      },
      {
        "key": "nginx.worker_pid",
        "type": "string",
        "value": "10767"
      },
      {
        "key": "peer.address",
        "type": "string",
        "value": "10.244.4.100:37542"
      },
      {
        "key": "http.method",
        "type": "string",
        "value": "POST"
      },
      {
        "key": "http.url",
        "type": "string",
        "value": "YYY"
      },
      {
        "key": "http.host",
        "type": "string",
        "value": "XXX"
      },
      {
        "key": "http.status_code",
        "type": "int64",
        "value": "204"
      },
      {
        "key": "http.status_line",
        "type": "string",
        "value": "204 No Content"
      }
    ],
    "logs": [],
    "processID": "",
    "process": {
      "serviceName": "ingress-controller",
      "tags": [
        {
          "key": "jaeger.version",
          "type": "string",
          "value": "C++-0.2.0"
        },
        {
          "key": "hostname",
          "type": "string",
          "value": "vega"
        },
        {
          "key": "ip",
          "type": "string",
          "value": "127.0.0.1"
        }
      ]
    },
    "warnings": null,
    "startTimeMillis": 1530575763255
  },
  "fields": {
    "startTimeMillis": [
      "2018-07-02T23:56:03.255Z"
    ]
  },
  "sort": [
    1530575763255
  ]
}

Notice the arrays here. The problem is that we were actually thinking about completely replacing debug logs with debug traces, but because everything is in arrays we can't index these spans in ES and thus cant really reliably search them. Jaeger is nice, but ES has much richer search capabilities and it would be just great if we could treat spans as regular structured documents we can put in ES and index properly.

Is there any plans to support this use case?

@yurishkuro
Copy link
Member

  • (a) as far as I know, we already index everything in the spans in ES, despite having arrays
  • (b) what alternative format do you propose?

@Monnoroch
Copy link
Author

How interesting. When I create the "indexed pattern" from the spans in kibana it indeed picks up all the fileds in arrays. However, I can't search in Kibana by tag, for example. Is there any specific ES/Kibana configuration I need to apply? Didn't see anything in the docs.

@objectiser
Copy link
Contributor

Maybe having two maps, tags and tagTypes:

    "tags": {
        "component": "nginx",
        "nginx.worker_pid": "10767",
        "peer.address": "10.244.4.100:37542",
        "http.method": "POST",
        ......
     },
     "tagTypes": {
        "component": "string",
        "nginx.worker_pid": "string",
        "peer.address": "string",
        "http.method": "string",
        ......
     }

@Monnoroch
Copy link
Author

Monnoroch commented Jul 3, 2018

I'm not sure discussing the actual format is appropriate at this stage, given the "Jaeger indexes everything" response. @yurishkuro do you have any pointers how I might can Kibana to be able to search spans by tags/logs/etc? Let's investigate this and only start thinking about changing the schema after we conclude that it's indeed impossible today.

@yurishkuro
Copy link
Member

I am not an expect in Kibana. You can look at the code for ES span storage to see how it executes queries.

https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/spanstore/reader.go

@Monnoroch
Copy link
Author

Monnoroch commented Jul 3, 2018

@yurishkuro Kibana is just a nice UI to render logs, it doesn't do anything interesting.

The issue is that Kibana cannot search on nested objects, so this is not necessarily Jaeger's problem. However, from the product perspective, this issue definitely makes Jaeger not being able to completely replace debug logs, which it clearly should be able to, as traces basically ARE logs with some additional metadata for the tree structure. So this implementation detail does mess up a very reasonable scenario which can exactly halve the storage needed for storing debug logs for teams who use ES already. I say it's definitely worth implementing.

@yurishkuro
Copy link
Member

I am still not clear what your proposal is that is "worth implementing". Traces are not regular logs, they are more structured. You can forward logs into the current span, which creates nested, one to many structure. You can use Jaeger UI to view those logs in the context of the span that they belong to, arguably a lot more useful experience than looking at a flat dump of logs across all requests.

@Monnoroch
Copy link
Author

Monnoroch commented Jul 3, 2018

I propose a flag to enable a different ES schema that will have plain objects instead of arrays with nested subobjects. One such proposal was presented above, but one possible alternative would be

{
    "tags": {
        "component": {
            "type": "string",
            "value": "nginx",
        },
        ......
     },
}

Basically, with the tag name in the key. Same with process.tags and logs.

@yurishkuro
Copy link
Member

afaik this creates problems when tags have dots in them, which a lot of standard OT tags do, because ES treats dots as hierarchy.

Also, how does this solve the problem with logs? The logs are still an array within the span.

@Monnoroch
Copy link
Author

The logs would be stored in exactly the same way, with something unique as a key. Possibly the timestamp, but I haven't put much thought in it yet.

The dots can be converted to something else like _ for storage, can't they?.

@Monnoroch
Copy link
Author

Monnoroch commented Jul 3, 2018

You can use Jaeger UI to view those logs in the context of the span that they belong to, arguably a lot more useful experience than looking at a flat dump of logs across all requests.

Yes, Jaeger UI is a better tool for looking at individual spans. But a native ES log search UI (i.e. Kibana as the most popular one) is MUCH better at searching for logs with complicated search requests that might, for example, contain regexes on log messages or whatnot. Kibana can also nicely visualize logs and you can build neat dashboards with graphs and such like. Jaeger UI is not there yet and making storage format Kibana-compatible does make a lot of sense to me.

@yurishkuro
Copy link
Member

The logs would be stored in exactly the same way, with something unique as a key. Possibly the timestamp, but I haven't put much thought in it yet.

That means you would need to always use a wildcard search expressions to skip that unique key?

The dots can be converted to something else like _ for storage, can't they?

Similarly, how would the user define search expressions? In Jaeger UI the tags will be "span.kind", but in ES it would be "span_kind", and the user would need to know that somehow.

BTW, I think Zipkin's ES implementation does something like what you're describing, but I haven't looked in detail how they deal with these issues.

@Monnoroch
Copy link
Author

Similarly, how would the user define search expressions? In Jaeger UI the tags will be "span.kind", but in ES it would be "span_kind", and the user would need to know that somehow.

It's certainly better than not being able to search at all, if you're not a programmer using curl and building search requests manually :)

In any case, I'm certainly not suggesting that this is something that must be done, I just think that this is a very useful feature that can solve a real problem of having to store logs twice: in spans for Jaeger and raw logs for searching and dashboarding AND having a separate infra for collecting those raw logs (though one probably would have it already, so this is not a major point).

Just something to consider.

@kacper-jackiewicz
Copy link

Just FYI, I came across the same problem. As of now the only workaround is to do the scripted fields with flattened tag list, searches on scripted fields are supported in Kibana 6.0+. This has however rather terrible performance.

@Monnoroch
Copy link
Author

Monnoroch commented Jul 6, 2018

@kacper-jackiewicz Can you please share the field definitions? This was exactly my idea too, but I must admit that I have failed to write correct scripts quickly myself. And this can probably be universally useful for "Jaeger over ES" users.

@kacper-jackiewicz
Copy link

kacper-jackiewicz commented Jul 6, 2018

@Monnoroch In my implementation just simple concat. Using Painless.
params._source['tags'].stream().map(item->item['key']).collect(Collectors.joining())

//edit: forgot to mention you

@mabn
Copy link

mabn commented Jul 6, 2018

I also had this issue with kibana. Additionally switching from nested documents to flat schema should help with query performance.
Anyway, the main constraint is the limit of fields in ES - index.mapping.total_fields.limit defaults to 1000 and can be increased, but I think I've read somewhere that 10k is too much.

My idea would be to create structure like this:

"tags": {
   "component_string": "nginx",
   "nginx_worker_pid_long": 10767,
   "peer_address_string": "10.244.4.100:37542",
   "http_method_string": "POST",
   ...
}
...
"other_fields": [
   "randomtag=1234",
   "otherrandomtag=ajhsdj"
]
  1. there is no need to index field type, it's sufficient to put it into the field name.
  2. "other_fields" allow to store arbitrary number of key-values and allow querying similar to what is available in Cassandra:
    • equals: use ES terms query on other_field and string otherrandomtag=ajhsdj
    • prefix search: use ES prefix query on other_field and string otherrandomtag=aj
  3. this allows to use long ES type for tags with long values and boolean ES type for tags with boolean values - leading to appropriate and smaller indices.

The keys in tags would be supported nicely in Kibana, while the remaining ones in other_fields would still be query-able if needed.

@yurishkuro
Copy link
Member

yurishkuro commented Jul 6, 2018

Will ES blow up if the same tag name in different spans is set to different value type? E.g.

"tags": {
   "error": "true"
}

"tags": {
   "error": true
}

@mabn
Copy link

mabn commented Jul 6, 2018

Will ES blow up if the same tag name in different spans is set to different value type? E.g.

It should be defined either as string or boolean in the mapping. The value with incorrect type will either be rejected or coerced, depending on the settings.

@pavolloffay
Copy link
Member

I like @mabn idea #906 (comment) but:

  • kibana users have to supply the type suffix
  • there is still problem with dost in tag keys

Here is an example of zipkin index and data

{
        "_index" : "zipkin:span-2018-07-24",
        "_type" : "span",
        "_id" : "AWTMwYfRE_JqQdsPFSM5",
        "_score" : 1.0,
        "_source" : {
          "traceId" : "81d7ee7cd45c831a",
          "duration" : 361483,
          "remoteEndpoint" : {
            "ipv4" : "127.0.0.1",
            "port" : 55230
          },
          "shared" : true,
          "localEndpoint" : {
            "serviceName" : "testsleuthzipkin",
            "ipv4" : "10.33.144.152"
          },
          "timestamp_millis" : 1532443591469,
          "kind" : "SERVER",
          "name" : "get",
          "id" : "3b96174448804d8a",
          "parentId" : "81d7ee7cd45c831a",
          "timestamp" : 1532443591469579,
          "tags" : {
            "http.method" : "GET",
            "http.path" : "/hi2",
            "mvc.controller.class" : "SampleController",
            "mvc.controller.method" : "hi2",
            "random-sleep-millis" : "353"
          }
        }
}

{
  "zipkin:span-2018-07-24" : {
    "aliases" : { },
    "mappings" : {
      "span" : {
        "_source" : {
          "excludes" : [
            "_q"
          ]
        },
        "dynamic_templates" : [
          {
            "strings" : {
              "match" : "*",
              "match_mapping_type" : "string",
              "mapping" : {
                "ignore_above" : 256,
                "norms" : false,
                "type" : "keyword"
              }
            }
          }
        ],
        "properties" : {
          "_q" : {
            "type" : "keyword"
          },
          "annotations" : {
            "type" : "object",
            "enabled" : false
          },
          "duration" : {
            "type" : "long"
          },
          "id" : {
            "type" : "keyword",
            "ignore_above" : 256
          },
          "kind" : {
            "type" : "keyword",
            "ignore_above" : 256
          },
          "localEndpoint" : {
            "dynamic" : "false",
            "properties" : {
              "serviceName" : {
                "type" : "keyword"
              }
            }
          },
          "name" : {
            "type" : "keyword"
          },
          "parentId" : {
            "type" : "keyword",
            "ignore_above" : 256
          },
          "remoteEndpoint" : {
            "dynamic" : "false",
            "properties" : {
              "serviceName" : {
                "type" : "keyword"
              }
            }
          },
          "shared" : {
            "type" : "boolean"
          },
          "tags" : {
            "type" : "object",
            "enabled" : false
          },
          "timestamp" : {
            "type" : "long"
          },
          "timestamp_millis" : {
            "type" : "date",
            "format" : "epoch_millis"
          },
          "traceId" : {
            "type" : "keyword"
          }
        }
      },
      "_default_" : {
        "dynamic_templates" : [
          {
            "strings" : {
              "match" : "*",
              "match_mapping_type" : "string",
              "mapping" : {
                "ignore_above" : 256,
                "norms" : false,
                "type" : "keyword"
              }
            }
          }
        ]
      }
    },
    "settings" : {
      "index" : {
        "number_of_shards" : "5",
        "provided_name" : "zipkin:span-2018-07-24",
        "mapper" : {
          "dynamic" : "false"
        },
        "creation_date" : "1532440048888",
        "requests" : {
          "cache" : {
            "enable" : "true"
          }
        },
        "analysis" : {
          "filter" : {
            "traceId_filter" : {
              "type" : "pattern_capture",
              "preserve_original" : "true",
              "patterns" : [
                "([0-9a-f]{1,16})$"
              ]
            }
          },
          "analyzer" : {
            "traceId_analyzer" : {
              "filter" : "traceId_filter",
              "type" : "custom",
              "tokenizer" : "keyword"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "aOhYlv9lTCCAtMqwhDKWvQ",
        "version" : {
          "created" : "5061099"
        }
      }
    }
  }
}

@pavolloffay
Copy link
Member

pavolloffay commented Aug 7, 2018

I have been digging into Zipkin impl. From the above output we can see that tags are modeled as object but indexing is disabled (enabled: false). The query works on _q field (keyword):

The search in Kibana does not work either. It only allows to choose specific field e.g. tags.http.path. Whereas when Jaeger index is used it's possible to select the whole tags.

screenshot of kibana
screenshot of kibana 1

@pavolloffay
Copy link
Member

I have also tried https://github.com/ppadovani/KibanaNestedSupportPlugin. For more details see https://github.com/pavolloffay/jaeger-kibana. The issue is that is defines it's own query language. But the search worked.

I think we cannot change . to _. There are standard tags cotaining _ e.g. http.status_code https://github.com/opentracing/specification/blob/master/semantic_conventions.yaml#L17. We could only use suffix (_string) to infer the type.

@pavolloffay
Copy link
Member

Maybe we could use array datatpe https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html to store all tags in

["http.url=/foo", "error=true"] The type could be as a suffix added to key or have a second array containing the types.

@objectiser
Copy link
Contributor

I don't think the array approach is a good idea, as it would affect any processing (e.g. aggregations) of values in ES.

As far as I can tell, the only issue with the ideas mentioned by @Monnoroch and @mabn is the dots in the key. This can simply be resolved by selecting a different character that isn't used by OT standard tags, e.g. colon.

My preference would be @Monnoroch's approach, as it avoids the type suffix.

@Monnoroch
Copy link
Author

@objectiser the second issue is the data types that need to be consistent for all spans. However, this is a good practice anyway, so this limitation is not necessarily critical. One can still store heterogeneous data in payload, it's just that it won't be indexed very well.

@pavolloffay
Copy link
Member

Just a documentation:

Will ES blow up if the same tag name in different spans is set to different value type? E.g.

For mapping: "tags":{"type":"object"} it does blow up if the

"tags":{
	"a": "true",
	"a": true
    },    

is present in the first span. ES returns
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"mapper [tags.a] of different type, current_type [text], merged_type [boolean]"}],"type":"illegal_argument_exception","reason":"mapper [tags.a] of different type, current_type [text], merged_type [boolean]"},"status":400}%

However it passes on the second span. The type of the property is set based on the value from the first field in the first span. E.g. first span contains a: "true" then the type is set to text.

@pavolloffay
Copy link
Member

hi All,

I have submitted #980

tags: {
  "http:method": "GET",
}

Performance test with 300k spans and query
http://localhost:16686/api/traces?service=perf-test-thread-0&limit=50000&lookback=1h&tags={"fooo.bar*?%http.d6cconald":"hehuhoh$?ij","fooo.ba2sar":"true","fooo.ba4342r":"1","fooo.bar1":"fobarhax*+??","fooo.bar*?%http.do**2nald":"goobarRAXbaz","fooo.bar*?%http.don(a44ld":"goobarRAXbaz","fooo.ba24r*?%":"hehe"}. More results can be found in linked PRs.

limit 50000: mean 13363.90 milliseconds
limit 1500: mean = 441.96 milliseconds
limit 20: mean = 17.88 milliseconds.

Report time with multiple queries
191265.00 milliseconds 3.18 min

and #982

tags: {
  "http:method": {
     "value": "GET",
     "type": "string"
  }
}

limit 50000: mean = 13533.82 milliseconds
limit 1500: mean = 681.32 milliseconds
limit 20: mean = 41.09 milliseconds

Report time with multiple queries
260223.00 milliseconds = 4.3 min

Results for master, tags as nested datatype:

tags: {
  {},{},{}
 }

limit 50000: mean = 12683.73 milliseconds
limit 1500: mean = 405.27 milliseconds
limit 20: mean = 26.40 milliseconds

Report time:
206348.00 milliseconds = 3.4 min

The biggest limitation when using tag key as object key is index.mapping.total_fields.limit. With the default index setting I was able to store 180 #980 or 480 #982 unigue tags. Maybe it could be increased by overriding default mapping and disabling .raw field.

Given this limitation I would like to use a combined mapping. OT standard tags (or configured) would be stored as object datatype to enable query in kibana. Is there a consensus to move with this direction?

@pavolloffay
Copy link
Member

@Monnoroch @mabn @kacper-jackiewicz @yurishkuro could you please have a look at ^^^ I would like to get it done quickly.

@pavolloffay
Copy link
Member

Most important piece of my proposal is that only *specified tags would be stored as direct object. Other tags would still be stored as nested object like it is right now.

standard OpenTracing tags https://github.com/opentracing/specification/blob/master/semantic_conventions.yaml#L9 and configured tags.

@Monnoroch
Copy link
Author

@pavolloffay can we have an option "I am a good engineer and my tags are typed, please store them all in an object"? :)

@pavolloffay
Copy link
Member

@Monnoroch the type is not problematic. There is a limit on unique field keys, please read #906 (comment).

@Monnoroch
Copy link
Author

@pavolloffay ah! Missed that bit, thanks. Still, would be nice to have a flag for reversing the logic: blacklisting tags instead of whitelisting.

Made some comments in the PR.

@Monnoroch
Copy link
Author

I expect that for most people 180 tags is a theoretical limitation rather than a practical one, really.

@pavolloffay
Copy link
Member

We don't know how people are using them. If somebody is over the limit they would not be able to upgrade. The backup logic for figuring out what tag should be stored in different mapping would be very ugly...

@Monnoroch
Copy link
Author

Yeah, but nowadays people use more and more gRPC services instead or raw HTTP, and there are other RPC frameworks and the standard will not be able to keep up and the feature will become much less useful. Not to mention that bigger companies have their own mini frameworks with custom tag names.

My feeling is that having a reasonable number of tags is a more reasonable limitation than not being able to introduce your own tag names because you won't be able to search by them. With a whitelist I can only search by 20 pre-defined tags, while with a blacklist it's 180.

@norbertwnuk
Copy link

@pavolloffay I agree with @Monnoroch - there must be a way to extend standard list of tags (whitelist) to cover businesses related tags / existing conventions in frameworks (e.g. guid:x-request-id from Istio). From that perspective however 'index.mapping.total_fields.limit' is problematic for large multi-tenant installations where even a few custom tags per projects will eventually sum up to that limit. There are people reporting indexes running with 10k+ fields however ultimately Jaeger should make ES storage multi-tenant aware too (e.g. index per day per application / namespace on K8S). Until that time it would be up to the administrator to track custom tags using whitelist, where the most generic approach would be to just allow for all tags using '*' regex (to address @Monnoroch remark for flexibility). I prefer 980 over 982 since do not see the reason to store original type for tags - can you elaborate more on the original reasons / use case plus the fact that 980 works without it.

@pavolloffay
Copy link
Member

pavolloffay commented Aug 20, 2018

The model follows OpenTracing API which allows different types for tag values. IIRC tag type is only used in storage integration tests. The other consumer can be post-processing job. @yurishkuro do we or at Uber use tag types?

@yurishkuro
Copy link
Member

For the reference, Zipkin and OpenCensus API only support string tags.

OpenTracing supports typed tags, and Jaeger stores them as typed values, but so far we have not built any indexing capability that would make use of the types, such as supporting range queries for http.status_code as integer. However, the big data jobs can indeed use typed values.

@pavolloffay
Copy link
Member

I have submitted #1018 as a final PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants