High cardinality labels #91

sandstrom · 2018-12-13T13:52:18Z

For many logging scenarios it's useful to lookup/query the log data on high cardinality labels such as request-ids, user-ids, ip-addresses, request-url and so on.

While I realize that high cardinality will affect indices, I think it's worth discussing whether this is something that Loki can support in the future.

There are a lot of logging use-cases where people can easily ditch full-text search that ELK and others provide. But not having easy lookup on log metadata, such as user-id or request-id, is a problem.

I should note that this doesn't necessarily need to be "make labels high cardinality", it could also be the introduction of some type of log-line metadata or properties, that are allowed to be high cardinality and can be queried for.

Example

Our services (nginx, application servers, etc) emit JSON lines[1] like this:

{ "timestamp" : "2018-12-24T18:00:00", "user-id" : "abc", "request-id" : "123", "message" : "Parsing new user data" }
{ "timestamp" : "2018-12-24T18:01:00", "user-id: " "def", "request-id" : "456", "message" : "Updating user record" }
{ "timestamp" : "2018-12-24T18:02:00", "user-id: " "abc", "request-id" : "789", "message" : "Keep alive ping from client" }

We ingest these into our log storage (which we would like to replace with Loki), and here are some common types of tasks that we currently do:

Bring up logs for a particular user. Usually to troubleshoot some bug they are experiencing. Mostly we know the rough timeframe (for example that it occurred during the past 2 weeks). Such a filter will usually bring up 5-200 entries. If there are more than a few entries we'll usually filter a bit more, on a stricter time intervall or based on other properties (type of request, etc).
Find the logs for a particular request, based on its request id. Again, we'd usually know the rough timeframe, say +/- a few days.
Looking at all requests that hit a particular endpoint, basically filtering on 2-3 log entry properties.

All of these, which I guess are fairly common for a logging system, require high cardinality labels (or some type of metadata/properties that are high cardinality and can be queried).

[1] http://jsonlines.org/

Updates

A few updates, since this issue was originally opened.

This design proposal doesn't directly address high-cardinality labels, but solves some of the underlying problems.

With LogQL you can grep over large amounts of data fairly easily, and find those needles. It assumes that the queries can run fast enough on large log corpora.
There is some discussion in this thread, on making it easier to find unique IDs, such as trace-id and request-id

The text was updated successfully, but these errors were encountered:

sandstrom · 2018-12-13T13:58:52Z

A thought:

An alternative to high cardinality labels could be to introduce a complement to labels called metadata, that is allowed to be high cardinality. Having two different things could allow us to impose other restrictions on the metadata type of values.

If grep is quick enough, maybe it would work to not index the metadata key/value pairs. But still allow them to be filtered (at "grep speed"). That would allow you to give these high-cardinality key-value pairs blessed UI, for easy filtering, whilst still avoiding the cost of indexing high cardinality fields.

tomwilkie · 2018-12-14T22:35:01Z

While I realize that high cardinality will affect indices, I think it's worth discussing whether this is something that Loki can support in the future.

The metadata we index for each streams has the same restrictions as it does for Prometheus labels: I don't want to say never, but I'm not sure if this is something we're ever likely to support.

An alternative to high cardinality labels could be to introduce a complement to labels called metadata, that is allowed to be high cardinality.
But not having easy lookup on log metadata such as user-id or request-id may pose a problem.

Does the support for regexp filtering allow you to filter by a given user-id or request-id and achieve the same end as a second sets of non-index labels?

I'd agree its cumbersome, so perhaps adding some jq style language here to make this more natural for json logs would be better?

sandstrom · 2018-12-17T10:00:22Z

@tomwilkie It's all about speed, really. For our current ELK-stack — which we would like to replace with Loki — here are some common tasks:

Bring up logs for a particular user. Usually to troubleshoot some bug they are experiencing. Mostly we know the rough timeframe (for example that it occurred during the past 2 weeks). Such a filter will usually bring up 5-200 entries. If there are more than a few entries we'll usually filter a bit more, on a stricter time intervall or based on other properties (type of request, etc).
Find the logs for a particular request, based on its request id. Again, we'd usually know the rough timeframe, say +/- a few days.
Looking at all requests that hit a particular endpoint, basically filtering on 2-3 log entry properties.

Could these be supported by regexp filtering, grep style? Perhaps, but it would depend on how quick that filtering/lookup would be.

Some stats:

Storing 60 days of log data
26M log lines (with ~10 fields of metadata per line)
130 gb total
2-3 gb per day

Since our log volumes are so small, maybe it'll be easy to grep through?

Regarding your "jq style language" suggestion, I think that's a great idea! Even better would be UI support for key-value filtering on keys in the top-level of that json document. Usually people will have the log message as one key (for example message), and the metadata as other top-level keys.

Logging JSON lines seems to be common enough that it's worth building some UI/support around it:

I've copied parts of this comment into the top-level issue, for clarity. Still kept here too, for context/history.

mumoshu · 2019-03-25T03:33:38Z

Would this make sense in regard to storing/searching traces and spans, too?

I think I saw that Grafana had recently announced the "LGTM" stack, where T stands for Trace/Tracing. My "impression" on the announcement was that, you may be going to use Loki with optional indices on, say "trace id" and "span id", to make traces stored in Loki searchable, so that it can be an alternative datastore for Zipkin or Jaeger.

I don't have a strong opinion if this should be in Loki or not. Just wanted to discuss and get the idea where we should head in relation to the LGTM stack 😃

mumoshu · 2019-03-31T06:54:01Z

Never mind on my previous comment.

I think we don't need to force Loki to blur its scope and break its project goals here.

We already have other options like Elasticsearch(obviously), Badger, RocksDB, BoltDB, and so on when it comes to a trace/span storage, as seen in Jaeger jaegertracing/jaeger#760, for example.

Regarding distributed logging, nothing prevents us from implementing our own solution with promtail + badger for example, or using Loki in combination with an another distributed logging solution w/ a more richer indexing.

I'd just use Loki for light-weight, short-term, cluster-local distributed logging solution. I use Prometheus with a similar purpose for metrics. Loki remains super useful even without high cardinality labels support or secondary indices. Just my two cents 😃

candlerb · 2019-04-09T09:12:46Z

I've made a suggestion in point (2) here with a simple way of dealing with high cardinality values.

In short: keep the high-cardinality fields out of the label set, but parse them as pseudo-labels which are regenerated on demand when reading the logs.

stale · 2019-09-03T16:10:35Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

sandstrom · 2019-09-04T08:12:03Z

Stale bot: I don't think it makes sense to close this issue.

Although I understand that high cardinality labels may not be on the immediate roadmap, it's a common feature of log aggregation systems and by keeping it open there will at least be a place where this + workarounds can be discussed.

candlerb · 2019-09-04T09:13:28Z

An alternative to high cardinality labels could be to introduce a complement to labels called metadata, that is allowed to be high cardinality. Having two different things could allow us to impose other restrictions on the metadata type of values.

If grep is quick enough, maybe it would work to not index the metadata key/value pairs. But still allow them to be filtered (at "grep speed"). That would allow you to give these high-cardinality key-value pairs blessed UI, for easy filtering, whilst still avoiding the cost of indexing high cardinality fields.

This approach is very similar to influxdb, which has tags and fields. Influxdb tags are like loki labels: each set of unique tags defines a time series. Fields are just stored values.

Even linear searching would be "better than grep" speed, since if the query contains any labels, loki would already filter down to the relevant chunks.

Idea 1

Store arbitrary metadata alongside the line.

In order not to complicate the query language, I suggest that these stored-but-unindexed metadata fields look like labels with a special format: e.g. they start with a special symbol (e.g. _). Then you could do:

query {router="1.2.3.4", type="netflow", _src_ip=~"10\.10\.1\..*"} foo

In terms of the API, submitting records could be done the same way. Unfortunately, it would defeat the grouping of records with same label set, making the API more verbose and making it harder for loki to group together records belonging to the same chunk. Example:

{
  "streams": [
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\",_src_ip=\"10.10.1.50\"}",
      "entries": [{ "ts": "2018-12-18T08:28:06.801064-04:00", "line": "...netflow record 1..." }]
    },
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\",_src_ip=\"10.10.1.84\"}",
      "entries": [{ "ts": "2018-12-18T08:28:06.801065-04:00", "line": "...netflow record 2..." }]
    },
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\",_src_ip=\"10.10.1.42\"}",
      "entries": [{ "ts": "2018-12-18T08:28:06.801066-04:00", "line": "...netflow record 3..." }]
    }
  ]
}

So in the API it might be better to include the metadata inline in each record:

{
  "streams": [
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\"}",
      "entries": [
        { "ts": "2018-12-18T08:28:06.801064-04:00", "_src_ip": "10.10.1.50", "line": "...netflow record 1..." },
        { "ts": "2018-12-18T08:28:06.801065-04:00", "_src_ip": "10.10.1.84", "line": "...netflow record 2..." },
        { "ts": "2018-12-18T08:28:06.801066-04:00", "_src_ip": "10.10.1.42", "line": "...netflow record 3..." }
      ]
    }
  ]
}

In the grafana UI you could have one switch for displaying labels, and one for displaying metadata.

Wild idea 2

The same benefits as the above could be achieved if the "line" were itself a JSON record: stored in native JSON, and parsed and queried dynamically like jq.

You can do this today, if you just write serialized JSON into loki as a string. Note: this sits very well with the ELK/Beats way of doing things.

{
  "streams": [
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\"}",
      "entries": [
        { "ts": "2018-12-18T08:28:06.801064-04:00", "line":"{\"_src_ip\": \"10.10.1.50\", \"record\": \"...netflow record 1...\"}" },
        { "ts": "2018-12-18T08:28:06.801065-04:00", "line":"{\"_src_ip\": \"10.10.1.84\", \"record\": \"...netflow record 2...\"}" },
        { "ts": "2018-12-18T08:28:06.801066-04:00", "line":"{\"_src_ip\": \"10.10.1.42\", \"record\": \"...netflow record 3...\"}" }
      ]
    }
  ]
}

The only thing you can't do today is the server-side filtering of queries.

Therefore, you'd need to extend the query language to allow querying on fields of the JSON "line", in the main query part outside the label set. I think that using jq's language could be the ideal solution: the world does not need another query language, and it would allow querying down to arbitrary levels of nesting.

As an incremental improvement, you could then extend the API to natively accept a JSON object, handling the serialization internally:

{
  "streams": [
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\"}",
      "entries": [
        { "ts": "2018-12-18T08:28:06.801064-04:00", "line":{"_src_ip": "10.10.1.50", "record": "...netflow record 1..."} },
        { "ts": "2018-12-18T08:28:06.801065-04:00", "line":{"_src_ip": "10.10.1.84", "record": "...netflow record 2..."} },
        { "ts": "2018-12-18T08:28:06.801066-04:00", "line":{"_src_ip": "10.10.1.42", "record": "...netflow record 3..."} }
      ]
    }
  ]
}

On reading records back out via the API, it could be optional whether loki returns you the raw line inside a string, or (for valid JSON) gives you the object. There are certainly cases where you'd want to retain the string version, e.g. for logcli.

Storing a 1-bit flag against each line, saying whether it's valid JSON or not, could be a useful optimisation.

The big benefits of this approach are (a) no changes to underlying data model; (b) you gain much more power dealing with log data which is already natively JSON; (c) you gain the ability to filter on numeric fields, with numeric comparison operators like < and > (where 2 < 10, unlike "2" > "10")

Aside: for querying IP address ranges, regexps aren't great. One option would be to enhance the query language with IP-address specific operators. Another one is to convert IP addresses to fixed-width hexadecimal (e.g. 10.0.0.1 = "0a000001") before storing them in metadata fields; prefix operations can be converted to regexp mechanically.

e.g. 10.0.0.0/22 => 0a000[0123].* or better 0a000[0123][0-9a-f]{2}

kiney · 2019-09-04T09:27:42Z

I like both ideas, especially the second one. For the query language i think JMESPath would make more sense than the jq language as it is already supported for log processing in scrape config:
https://github.com/grafana/loki/blob/master/docs/logentry/processing-log-lines.md

I though about extracting some Info there and setting it as label. What would actually happen when creating high cardinality labels that way? "just" a very big index or would I run into real problems?

candlerb · 2019-09-04T09:41:30Z

What would actually happen when creating high cardinality labels that way? "just" a very big index or would I run into real problems?

You could end up with Loki trying to create, for example, 4 billion different timeseries, each containing one log line. Since a certain amount of metadata is required for each timeseries, and (I believe) each timeseries has its own chunks, it is likely to explode.

Also, trying to read back your logs will be incredibly inefficient, since it will have to merge those 4 billion timeseries back together for almost every query.

candlerb · 2019-09-04T11:20:50Z

@kiney: I hadn't come across JMESPath before, and I agree it would make sense to use the engine which is already being used. We need something that can be used as a predicate to filter a stream of JSON records, and ideally with a CLI for testing.

It looks like JMESPath's jp is the JMESPath equivalent to jq. (NOTE: this is different to the JSON Plotter jp which is what "brew install jp" gives you)

jp doesn't seem to be able to filter a stream of JSON objects like jq does. It just takes the first JSON item from stdin and ignores everything else.

As a workaround, loki could read batches of lines into JSON lists [ ... ] and apply JMESPath to each batch. Then you could use JMESPath list filter projections, e.g.

[?foo=='bar']

It needs to work in batches to avoid reading the whole stream into RAM: therefore
functions which operate over all records, like max_by, would not work. But to be fair, I don't think jq can do this either, without the -s (slurp into list) option, or unless you write your own functions.

I think the most common use case for logs is to filter them and return the entire record for each one selected. It would be interesting to make a list of the most common predicates and see how they are written in both jq and jp. e.g.

Select records containing top-level element "foo"
Select records where top-level element "foo" is string "bar"
Select records where top-level element "foo" is string "bar" and element "baz" is "qux"
Select records where top-level element "foo" matches regexp ^ba[rz]$ (*)
Select records where top-level numeric element "baz" is greater than 200
... etc

(*) Note: regular expressions are supported by jq but not as far as I can tell by JMESPath

Just taking the last example:

jq 'select(.baz > 200)' <<EOS
{"foo":"bar"}
{"foo":"baz"}
{"foo":"bar","baz":123}
{"foo":"bar","baz":789}
{"foo":"baz","baz":456}
EOS

=>
{
  "foo": "bar",
  "baz": 789
}
{
  "foo": "baz",
  "baz": 456
}

Compare:

jp '[?baz > `200`]' <<EOS
[{"foo":"bar"},
{"foo":"baz"},
{"foo":"bar","baz":123},
{"foo":"bar","baz":789},
{"foo":"baz","baz":456}]
EOS

=>
[
  {
    "baz": 789,
    "foo": "bar"
  },
  {
    "baz": 456,
    "foo": "baz"
  }
]

Note the non-obvious requirement for backticks around literals. If you omit them, the expression fails:

jp '[?baz > 200]' <<EOS
[{"foo":"bar"},
{"foo":"baz"},
{"foo":"bar","baz":123},
{"foo":"bar","baz":789},
{"foo":"baz","baz":456}]
EOS

=>
SyntaxError: Invalid token: tNumber
[?baz > 200]
        ^

String literals can either be written in single quotes, or wrapped in backticks and double-quotes. I find it pretty ugly, but then again, so are complex jq queries.

Aside: If you decide to give jq a set of records as a JSON list like we just did with jp, then it's a bit more awkward:

jq 'map(select(.baz > 200))' <<EOS  # or use -s to slurp into a list
[{"foo":"bar"},
{"foo":"baz"},
{"foo":"bar","baz":123},
{"foo":"bar","baz":789},
{"foo":"baz","baz":456}]
EOS

=>
[
  {
    "foo": "bar",
    "baz": 789
  },
  {
    "foo": "baz",
    "baz": 456
  }
]

But you probably wouldn't do that anyway with jq.

candlerb · 2019-09-04T11:27:05Z

There's another way jq could be used: just for predicate testing.

jq '.baz > 200' <<EOS
{"foo":"bar"}
{"foo":"baz"}
{"foo":"bar","baz":123}
{"foo":"bar","baz":789}
{"foo":"baz","baz":456}
EOS

=>
false
false
false
true
true

If loki sees the value 'false' it could drop the record, and the value 'true' it passes the original record through unchanged. Any other value would be passed through. This would allow the most common queries to be simplified.

kiney · 2019-09-04T11:59:42Z

@candlerb I also consider jq a slightly more pleasant language but JMESPath is already used in Loki and also the standard in other cloud tooling (aws cli, azure cli, ansible...).
I thought about using it similiar to current regex support: apply on each line, filter out lines with no match and highlight matched part in the remaining lines.

sandstrom · 2020-05-19T14:56:58Z

Updates

This design proposal doesn't directly address high-cardinality labels, but may solve the same underlying problem.

For some use-cases it would be equal to high-cardinality labels, as long as the LogQL queries can run fast enough on large log
There is some discussion in this thread, on making it easier to find unique IDs, such as trace-id and request-id

EDIT: copied to top comment

* Squashed 'tools/' changes from b783528..1fe184f 1fe184f Bazel rules for building gogo protobufs (grafana#123) b917bb8 Merge pull request grafana#122 from weaveworks/fix-scope-gc c029ce0 Add regex to match scope VMs 0d4824b Merge pull request grafana#121 from weaveworks/provisioning-readme-terraform 5a82d64 Move terraform instructions to tf section d285d78 Merge pull request grafana#120 from weaveworks/gocyclo-return-value 76b94a4 Do not spawn subshell when reading cyclo output 93b3c0d Use golang:1.9.2-stretch image d40728f Gocyclo should return error code if issues detected c4ac1c3 Merge pull request grafana#114 from weaveworks/tune-spell-check 8980656 Only check files 12ebc73 Don't spell-check pki files 578904a Special-case spell-check the same way we do code checks e772ed5 Special-case on mime type and extension using just patterns ae82b50 Merge pull request grafana#117 from weaveworks/test-verbose 8943473 Propagate verbose flag to 'go test'. 7c79b43 Merge pull request grafana#113 from weaveworks/update-shfmt-instructions 258ef01 Merge pull request grafana#115 from weaveworks/extra-linting e690202 Use tools in built image to lint itself 126eb56 Add shellcheck to bring linting in line with scope 63ad68f Don't run lint on files under .git 51d908a Update shfmt instructions e91cb0d Merge pull request grafana#112 from weaveworks/add-python-lint-tools 0c87554 Add yapf and flake8 to golang build image 35679ee Merge pull request grafana#110 from weaveworks/parallel-push-errors 3ae41b6 Remove unneeded if block 51ff31a Exit on first error 0faad9f Check for errors when pushing images in parallel 74dc626 Merge pull request grafana#108 from weaveworks/disable-apt-daily b4f1d91 Merge pull request grafana#107 from weaveworks/docker-17-update 7436aa1 Override apt daily job to not run immediately on boot 7980f15 Merge pull request grafana#106 from weaveworks/document-docker-install-role f741e53 Bump to Docker 17.06 from CE repo 61796a1 Update Docker CE Debian repo details 0d86f5e Allow for Docker package to be named docker-ce 065c68d Document selection of Docker installation role. 3809053 Just --porcelain; it defaults to v1 11400ea Merge pull request grafana#105 from weaveworks/remove-weaveplugin-remnants b8b4d64 remove weaveplugin remnants 35099c9 Merge pull request grafana#104 from weaveworks/pull-docker-py cdd48fc Pull docker-py to speed tests/builds up. e1c6c24 Merge pull request grafana#103 from weaveworks/test-build-tags d5d71e0 Add -tags option so callers can pass in build tags 8949b2b Merge pull request grafana#98 from weaveworks/git-status-tag ac30687 Merge pull request grafana#100 from weaveworks/python_linting 4b125b5 Pin yapf & flake8 versions 7efb485 Lint python linting function 444755b Swap diff direction to reflect changes required c5b2434 Install flake8 & yapf 5600eac Lint python in build-tools repo 0b02ca9 Add python linting c011c0d Merge pull request grafana#79 from kinvolk/schu/python-shebang 6577d07 Merge pull request grafana#99 from weaveworks/shfmt-version 00ce0dc Use git status instead of diff to add 'WIP' tag 411fd13 Use shfmt v1.3.0 instead of latest from master. 0d6d4da Run shfmt 1.3 on the code. 5cdba32 Add sudo c322ca8 circle.yml: Install shfmt binary. e59c225 Install shfmt 1.3 binary. 30706e6 Install pyhcl in the build container. 960d222 Merge pull request grafana#97 from kinvolk/alban/update-shfmt-3 1d535c7 shellcheck: fix escaping issue 5542498 Merge pull request grafana#96 from kinvolk/alban/update-shfmt-2 32f7cc5 shfmt: fix coding style 09f72af lint: print the diff in case of error 571c7d7 Merge pull request grafana#95 from kinvolk/alban/update-shfmt bead6ed Update for latest shfmt b08dc4d Update for latest shfmt (grafana#94) 2ed8aaa Add no-race argument to test script (grafana#92) 80dd78e Merge pull request grafana#91 from weaveworks/upgrade-go-1.8.1 08dcd0d Please ./lint as shfmt changed its rules between 1.0.0 and 1.3.0. a8bc9ab Upgrade default Go version to 1.8.1. 41c5622 Merge pull request grafana#90 from weaveworks/build-golang-service-conf e8ebdd5 broaden imagetag regex to fix haskell build image ba3fbfa Merge pull request grafana#89 from weaveworks/build-golang-service-conf e506f1b Fix up test script for updated shfmt 9216db8 Add stuff for service-conf build to build-goland image 66a9a93 Merge pull request grafana#88 from weaveworks/haskell-image cb3e3a2 shfmt 74a5239 Haskell build image 4ccd42b Trying circle quay login b2c295f Merge branch 'common-build' 0ac746f Trim quay prefix in circle script c405b31 Merge pull request grafana#87 from weaveworks/common-build 9672d7c Push build images to quay as they have sane robot accounts a2bf112 Review feedback fef9b7d Add protobuf tools 10a77ea Update readme 254f266 Don't need the image name in ffb59fc Adding a weaveworks/build-golang image with tags b817368 Update min Weave Net docker version cf87ca3 Merge pull request grafana#86 from weaveworks/lock-kubeadm-version 3ae6919 Add example of custom SSH private key to tf_ssh's usage. cf8bd8a Add example of custom SSH private key to tf_ansi's usage. c7d3370 Lock kubeadm's Kubernetes version. faaaa6f Merge pull request grafana#84 from weaveworks/centos-rhel ef552e7 Select weave-kube YAML URL based on K8S version. b4c1198 Upgrade default kubernetes_version to 1.6.1. b82805e Use a fixed version of kubeadm. f33888b Factorise and make kubeconfig option optional. f7b8b89 Install EPEL repo for CentOS. 615917a Fix error in decrypting AWS access key and secret. 86f97b4 Add CentOS 7 AMI and username for AWS via Terraform. eafd810 Add tf_ansi example with Ansible variables. 2b05787 Skip setup of Docker over TCP for CentOS/RHEL. 84c420b Add docker-ce role for CentOS/RHEL. 00a820c Add setup_weave-net_debug.yml playbook for user issues' debugging. 3eae480 Upgrade default kubernetes_version to 1.5.4. 753921c Allow injection of Docker installation role. e1ff90d Fix kubectl taint command for 1.5. b989e97 Fix typo in kubectl taint for single node K8S cluster. 541f58d Remove 'install_recommends: no' for ethtool. c3f9711 Make Ansible role docker-from-get.docker.com work on RHEL/CentOS. 038c0ae Add frequently used OS images, for convenience. d30649f Add --insecure-registry to docker.conf 1dd9218 shfmt -i 4 -w push-images 6de96ac Add option to not push docker hub images 310f53d Add push-images script from cortex 8641381 Add port 6443 to kubeadm join commands for K8S 1.6+. 50bf0bc Force type of K8S token to string. 08ab1c0 Remove trailing whitespaces. ae9efb8 Enable testing against K8S release candidates. 9e32194 Secure GCP servers for Scope: open port 80. a22536a Secure GCP servers for Scope. 89c3a29 Merge pull request grafana#78 from weaveworks/lint-merge-rebase-issue-in-docs 73ad56d Add linter function to avoid bad merge/rebase artefact 31d069d Change Python shebang to `#!/usr/bin/env python` 52d695c Merge pull request grafana#77 from kinvolk/schu/fix-relative-weave-path 77aed01 Merge pull request grafana#73 from weaveworks/mike/sched/fix-unicode-issue 7c080f4 integration/sanity_check: disable SC1090 d6d360a integration/gce.sh: update gcloud command e8def2c provisioning/setup: fix shellcheck SC2140 cc02224 integration/config: fix weave path 9c0d6a5 Fix config_management/README.md 334708c Merge pull request grafana#75 from kinvolk/alban/external-build-1 da2505d gce.sh: template: print creation date e676854 integration tests: fix user account 8530836 host nameing: add repo name b556c0a gce.sh: fix deletion of gce instances 2ecd1c2 integration: fix GCE --zones/--zone parameter 3e863df sched: Fix unicode encoding issues 51785b5 Use rm -f and set current dir using BASH_SOURCE. f5c6d68 Merge pull request grafana#71 from kinvolk/schu/fix-linter-warnings 0269628 Document requirement for `lint_sh` 9a3f09e Fix linter warnings efcf9d2 Merge pull request grafana#53 from weaveworks/2647-testing-mvp d31ea57 Weave Kube playbook now works with multiple nodes. 27868dd Add GCP firewall rule for FastDP crypto. edc8bb3 Differentiated name of dev and test playbooks, to avoid confusion. efa3df7 Moved utility Ansible Yaml to library directory. fcd2769 Add shorthands to run Ansible playbooks against Terraform-provisioned virtual machines. f7946fb Add shorthands to SSH into Terraform-provisioned virtual machines. aad5c6f Mention Terraform and Ansible in README.md. dddabf0 Add Terraform output required for templates' creation. dcc7d02 Add Ansible configuration playbooks for development environments. f86481c Add Ansible configuration playbooks for Docker, K8S and Weave-Net. efedd25 Git-ignore Ansible retry files. 765c4ca Add helper functions to setup Terraform programmatically. 801dd1d Add Terraform cloud provisioning scripts. b8017e1 Install hclfmt on CircleCI. 4815e19 Git-ignore Terraform state files. 0aaebc7 Add script to generate cartesian product of dependencies of cross-version testing. 007d90a Add script to list OS images from GCP, AWS and DO. ca65cc0 Add script to list relevant versions of Go, Docker and Kubernetes. aa66f44 Scripts now source dependencies using absolute path (previously breaking make depending on current directory). 7865e86 Add -p option to parallelise lint. 36c1835 Merge pull request grafana#69 from weaveworks/mflag 9857568 Use mflag and mflagext package from weaveworks/common. 9799112 Quote bash variable. 10a36b3 Merge pull request grafana#67 from weaveworks/shfmt-ignore a59884f Add support for .lintignore. 03cc598 Don't lint generated protobuf code. 2b55c2d Merge pull request grafana#66 from weaveworks/reduce-test-timeout d4e163c Make timeout a flag 49a8609 Reduce test timeout 8fa15cb Merge pull request grafana#63 from weaveworks/test-defaults git-subtree-dir: tools git-subtree-split: 1fe184f1f5330c4444c4377bef84f2d30e7dc7fe * Use keyed fields in composite literal * Squashed 'tools/' changes from 1fe184f..ccc8316 ccc8316 Revert "Gocyclo should return error code if issues detected" (grafana#124) git-subtree-dir: tools git-subtree-split: ccc831682b5d51e068b17fe9ad482f025abd1fbb

[release-5.6] Add release channel stable-5.6

feldentm-SAP · 2022-12-14T07:37:20Z

This also affects handling of OpenTelemetry attributes and TraceContext especially if the set of attributes is not trivial. If you have thousands of service instances in your Loki and you want to do queries like "give me all lines that have a certain action attribute set and failed" you cannot restrict that query on instances. Restricting it by time is also an issue because you might miss some interesting events. Not having it is an issue in detecting root causes of sporadic defects.

lsampras · 2023-07-04T09:23:22Z

Are there any further updates on this issue..

I see some solutions being built for trace-id use-case...

But is there a plan for supporting user-defined high-cardinality labels over a large time span...

candlerb · 2023-07-04T11:05:47Z

I think you can do what you need using the LogQL json filter. This extracts "labels" from the JSON body, but they are not logstream labels, so they don't affect cardinality.

These are not indexed, so it will perform a linear search across the whole timespan of interest. If you scale up Loki then it might be able to parallelize this across chunks - I'm not sure.

But if you want fully indexed searching, then I believe Loki isn't for you. You can either:

throw a ton of resources at the problem with Elasticsearch (tons of RAM, tons of SSD, expect your log volumes to increase by a factor of 10); or
dump all your logs into one of the "big data" products (hive, druid, kudu, drill, clickhouse etc), possibly using an on-disk format like parquet optimised for columnar searching; or
dip your toe in the water with the recently-announced VictoriaLogs. This looks like it hits a sweet spot with word splitting and indexing, but is not yet considered production-ready. Maybe Loki will consider adding something similar in future.

sandstrom · 2023-07-04T11:20:53Z

For others interested in this, this is the most recent "statement" I've seen from the Loki team:

#1282 (comment)

lsampras · 2023-07-04T12:38:45Z

But if you want fully indexed searching, then I believe Loki isn't for you.

Yeah,
Creating a complete index would be out-of-scope and against the design of Loki

But are we considering some middle ground
for e.g

the ability to define certain labels for which chunks would be bloom filtered instead of separate streams
letting users specify chunk id's in the loki query request so that they can build/maintain their own indexing etc.
we've also had some ideas around adding a co-processor

I wanted to gauge what in this case would we be willing to extend in loki to support this (without going against its core design principles)

valyala · 2023-07-13T04:53:46Z

Loki can implement support for high-cardinality labels in the way similar to VictoriaLogs - it just doesn't associate these labels with log streams. Instead, it stores them inside the stream - see storage architecture details. This allows resolving the following issues:

High RAM usage, OOM crashes, high number of open files, significant slowdown
Slow query performance when high-cardinality labels are stored in the log message itself.

The high-cardinality labels can then be queried with label_name:value filters - see these docs for details.

sandstrom · 2024-01-12T15:59:06Z

Time flies!

Issue number 91 might finally have a nice solution! 🎉

I don't want to say never, but I'm not sure if this is something we're ever likely to support.

@tomwilkie I'm happy it wasn't never 😄

Six years later, there is a solution on the horizon:

(not GA just yet though)

feldentm-SAP · 2024-01-17T14:13:58Z

Hi @sandstrom,

please inform us and your support organization, once this is GA.
Also, please make sure that the GA version will allow ingesting data that has the structured-metadata already irrespective of the kind of log source. For OTLP, there should be means of configuring what to keep or transforming the log into whatever your internal format is going to be.

Kind regards and thanks for the great contribution

sandstrom · 2024-04-09T17:37:38Z

A quick update for people following this. Loki 3.0 has shipped with experimental bloom filters, that are addressing exactly this issue. This is awesome news! 🎉

https://grafana.com/blog/2024/04/09/grafana-loki-3.0-release-all-the-new-features/

I'll keep this open until it's no longer experimental, which is probably in 3.1 I'd guess, unless a team member of Loki wants to close this out already, since it has shipped (albeit in an experimental state) -- I'm fine with either.

chaudum · 2024-04-29T19:20:51Z

I'll keep this open until it's no longer experimental, which is probably in 3.1 I'd guess, unless a team member of Loki wants to close this out already, since it has shipped (albeit in an experimental state) -- I'm fine with either.

Sounds good to me 😄

sandstrom mentioned this issue Dec 13, 2018

Set metadata/labels via log files (ingest time) #88

Closed

tomwilkie added component/loki type/feature Something new we should do labels Dec 18, 2018

mumoshu mentioned this issue Mar 25, 2019

Question: support indexing as optional #217

Closed

stale bot added the stale A stale issue or PR that will automatically be closed. label Sep 3, 2019

stale bot removed the stale A stale issue or PR that will automatically be closed. label Sep 4, 2019

cyriltovena added the keepalive An issue or PR that will be kept alive and never marked as stale. label Sep 5, 2019

sandstrom mentioned this issue Nov 29, 2019

Add inverted index from specific text to particular chunk. #1282

Open

electron0zero mentioned this issue Feb 9, 2021

Implementing structured logs grafana/smtprelay#10

Merged

periklis added a commit to periklis/loki that referenced this issue Dec 6, 2021

Make Dockerfile compatible for multi-platform builds (grafana#91)

6a85d7f

periklis pushed a commit to periklis/loki that referenced this issue Dec 12, 2022

Merge pull request grafana#91 from periklis/fix-release-5.6-channels

f406f35

[release-5.6] Add release channel stable-5.6

This was referenced Feb 20, 2023

[new feature] introduce Coprocessor #8559

Open

[new feature] introduce loki Coprocessor querier pre query ,And provider a golang demo XRayCoprocessor. #8568

Open

This was referenced Sep 6, 2023

introduce columnar chunk format #5723

Open

LID: introduce Coprocessor #8616

Closed

sandstrom mentioned this issue Apr 9, 2024

Loki 3.0 Feedback and Issues #12506

Open

chaudum added the feature/blooms label Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High cardinality labels #91

High cardinality labels #91

sandstrom commented Dec 13, 2018 •

edited

Loading

sandstrom commented Dec 13, 2018

tomwilkie commented Dec 14, 2018

sandstrom commented Dec 17, 2018 •

edited

Loading

mumoshu commented Mar 25, 2019

mumoshu commented Mar 31, 2019 •

edited

Loading

candlerb commented Apr 9, 2019

stale bot commented Sep 3, 2019

sandstrom commented Sep 4, 2019

candlerb commented Sep 4, 2019 •

edited

Loading

kiney commented Sep 4, 2019

candlerb commented Sep 4, 2019

candlerb commented Sep 4, 2019

candlerb commented Sep 4, 2019

kiney commented Sep 4, 2019

sandstrom commented May 19, 2020 •

edited

Loading

feldentm-SAP commented Dec 14, 2022

lsampras commented Jul 4, 2023

candlerb commented Jul 4, 2023

sandstrom commented Jul 4, 2023

lsampras commented Jul 4, 2023

valyala commented Jul 13, 2023

sandstrom commented Jan 12, 2024 •

edited

Loading

feldentm-SAP commented Jan 17, 2024

sandstrom commented Apr 9, 2024

chaudum commented Apr 29, 2024

High cardinality labels #91

High cardinality labels #91

Comments

sandstrom commented Dec 13, 2018 • edited Loading

Example

Updates

sandstrom commented Dec 13, 2018

tomwilkie commented Dec 14, 2018

sandstrom commented Dec 17, 2018 • edited Loading

mumoshu commented Mar 25, 2019

mumoshu commented Mar 31, 2019 • edited Loading

candlerb commented Apr 9, 2019

stale bot commented Sep 3, 2019

sandstrom commented Sep 4, 2019

candlerb commented Sep 4, 2019 • edited Loading

Idea 1

Wild idea 2

kiney commented Sep 4, 2019

candlerb commented Sep 4, 2019

candlerb commented Sep 4, 2019

candlerb commented Sep 4, 2019

kiney commented Sep 4, 2019

sandstrom commented May 19, 2020 • edited Loading

Updates

feldentm-SAP commented Dec 14, 2022

lsampras commented Jul 4, 2023

candlerb commented Jul 4, 2023

sandstrom commented Jul 4, 2023

lsampras commented Jul 4, 2023

valyala commented Jul 13, 2023

sandstrom commented Jan 12, 2024 • edited Loading

feldentm-SAP commented Jan 17, 2024

sandstrom commented Apr 9, 2024

chaudum commented Apr 29, 2024

sandstrom commented Dec 13, 2018 •

edited

Loading

sandstrom commented Dec 17, 2018 •

edited

Loading

mumoshu commented Mar 31, 2019 •

edited

Loading

candlerb commented Sep 4, 2019 •

edited

Loading

sandstrom commented May 19, 2020 •

edited

Loading

sandstrom commented Jan 12, 2024 •

edited

Loading