Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High cardinality labels #91

Open
sandstrom opened this issue Dec 13, 2018 · 25 comments
Open

High cardinality labels #91

sandstrom opened this issue Dec 13, 2018 · 25 comments
Labels
component/loki feature/blooms keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do

Comments

@sandstrom
Copy link

sandstrom commented Dec 13, 2018

For many logging scenarios it's useful to lookup/query the log data on high cardinality labels such as request-ids, user-ids, ip-addresses, request-url and so on.

While I realize that high cardinality will affect indices, I think it's worth discussing whether this is something that Loki can support in the future.

There are a lot of logging use-cases where people can easily ditch full-text search that ELK and others provide. But not having easy lookup on log metadata, such as user-id or request-id, is a problem.

I should note that this doesn't necessarily need to be "make labels high cardinality", it could also be the introduction of some type of log-line metadata or properties, that are allowed to be high cardinality and can be queried for.

Example

Our services (nginx, application servers, etc) emit JSON lines[1] like this:

{ "timestamp" : "2018-12-24T18:00:00", "user-id" : "abc", "request-id" : "123", "message" : "Parsing new user data" }
{ "timestamp" : "2018-12-24T18:01:00", "user-id: " "def", "request-id" : "456", "message" : "Updating user record" }
{ "timestamp" : "2018-12-24T18:02:00", "user-id: " "abc", "request-id" : "789", "message" : "Keep alive ping from client" }

We ingest these into our log storage (which we would like to replace with Loki), and here are some common types of tasks that we currently do:

  1. Bring up logs for a particular user. Usually to troubleshoot some bug they are experiencing. Mostly we know the rough timeframe (for example that it occurred during the past 2 weeks). Such a filter will usually bring up 5-200 entries. If there are more than a few entries we'll usually filter a bit more, on a stricter time intervall or based on other properties (type of request, etc).

  2. Find the logs for a particular request, based on its request id. Again, we'd usually know the rough timeframe, say +/- a few days.

  3. Looking at all requests that hit a particular endpoint, basically filtering on 2-3 log entry properties.

All of these, which I guess are fairly common for a logging system, require high cardinality labels (or some type of metadata/properties that are high cardinality and can be queried).

[1] http://jsonlines.org/

Updates

A few updates, since this issue was originally opened.

  • This design proposal doesn't directly address high-cardinality labels, but solves some of the underlying problems.

    With LogQL you can grep over large amounts of data fairly easily, and find those needles. It assumes that the queries can run fast enough on large log corpora.

  • There is some discussion in this thread, on making it easier to find unique IDs, such as trace-id and request-id

@sandstrom
Copy link
Author

A thought:

An alternative to high cardinality labels could be to introduce a complement to labels called metadata, that is allowed to be high cardinality. Having two different things could allow us to impose other restrictions on the metadata type of values.

If grep is quick enough, maybe it would work to not index the metadata key/value pairs. But still allow them to be filtered (at "grep speed"). That would allow you to give these high-cardinality key-value pairs blessed UI, for easy filtering, whilst still avoiding the cost of indexing high cardinality fields.

@tomwilkie
Copy link
Contributor

While I realize that high cardinality will affect indices, I think it's worth discussing whether this is something that Loki can support in the future.

The metadata we index for each streams has the same restrictions as it does for Prometheus labels: I don't want to say never, but I'm not sure if this is something we're ever likely to support.

An alternative to high cardinality labels could be to introduce a complement to labels called metadata, that is allowed to be high cardinality.
But not having easy lookup on log metadata such as user-id or request-id may pose a problem.

Does the support for regexp filtering allow you to filter by a given user-id or request-id and achieve the same end as a second sets of non-index labels?

I'd agree its cumbersome, so perhaps adding some jq style language here to make this more natural for json logs would be better?

@sandstrom
Copy link
Author

sandstrom commented Dec 17, 2018

@tomwilkie It's all about speed, really. For our current ELK-stack — which we would like to replace with Loki — here are some common tasks:

  1. Bring up logs for a particular user. Usually to troubleshoot some bug they are experiencing. Mostly we know the rough timeframe (for example that it occurred during the past 2 weeks). Such a filter will usually bring up 5-200 entries. If there are more than a few entries we'll usually filter a bit more, on a stricter time intervall or based on other properties (type of request, etc).

  2. Find the logs for a particular request, based on its request id. Again, we'd usually know the rough timeframe, say +/- a few days.

  3. Looking at all requests that hit a particular endpoint, basically filtering on 2-3 log entry properties.

Could these be supported by regexp filtering, grep style? Perhaps, but it would depend on how quick that filtering/lookup would be.

Some stats:

  • Storing 60 days of log data
  • 26M log lines (with ~10 fields of metadata per line)
  • 130 gb total
  • 2-3 gb per day

Since our log volumes are so small, maybe it'll be easy to grep through?


Regarding your "jq style language" suggestion, I think that's a great idea! Even better would be UI support for key-value filtering on keys in the top-level of that json document. Usually people will have the log message as one key (for example message), and the metadata as other top-level keys.

Logging JSON lines seems to be common enough that it's worth building some UI/support around it:


I've copied parts of this comment into the top-level issue, for clarity. Still kept here too, for context/history.

@mumoshu
Copy link

mumoshu commented Mar 25, 2019

Would this make sense in regard to storing/searching traces and spans, too?

I think I saw that Grafana had recently announced the "LGTM" stack, where T stands for Trace/Tracing. My "impression" on the announcement was that, you may be going to use Loki with optional indices on, say "trace id" and "span id", to make traces stored in Loki searchable, so that it can be an alternative datastore for Zipkin or Jaeger.

I don't have a strong opinion if this should be in Loki or not. Just wanted to discuss and get the idea where we should head in relation to the LGTM stack 😃

@mumoshu
Copy link

mumoshu commented Mar 31, 2019

Never mind on my previous comment.

I think we don't need to force Loki to blur its scope and break its project goals here.

We already have other options like Elasticsearch(obviously), Badger, RocksDB, BoltDB, and so on when it comes to a trace/span storage, as seen in Jaeger jaegertracing/jaeger#760, for example.

Regarding distributed logging, nothing prevents us from implementing our own solution with promtail + badger for example, or using Loki in combination with an another distributed logging solution w/ a more richer indexing.

I'd just use Loki for light-weight, short-term, cluster-local distributed logging solution. I use Prometheus with a similar purpose for metrics. Loki remains super useful even without high cardinality labels support or secondary indices. Just my two cents 😃

@candlerb
Copy link
Contributor

candlerb commented Apr 9, 2019

I've made a suggestion in point (2) here with a simple way of dealing with high cardinality values.

In short: keep the high-cardinality fields out of the label set, but parse them as pseudo-labels which are regenerated on demand when reading the logs.

@stale
Copy link

stale bot commented Sep 3, 2019

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Sep 3, 2019
@sandstrom
Copy link
Author

Stale bot: I don't think it makes sense to close this issue.

Although I understand that high cardinality labels may not be on the immediate roadmap, it's a common feature of log aggregation systems and by keeping it open there will at least be a place where this + workarounds can be discussed.

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Sep 4, 2019
@candlerb
Copy link
Contributor

candlerb commented Sep 4, 2019

An alternative to high cardinality labels could be to introduce a complement to labels called metadata, that is allowed to be high cardinality. Having two different things could allow us to impose other restrictions on the metadata type of values.

If grep is quick enough, maybe it would work to not index the metadata key/value pairs. But still allow them to be filtered (at "grep speed"). That would allow you to give these high-cardinality key-value pairs blessed UI, for easy filtering, whilst still avoiding the cost of indexing high cardinality fields.

This approach is very similar to influxdb, which has tags and fields. Influxdb tags are like loki labels: each set of unique tags defines a time series. Fields are just stored values.

Even linear searching would be "better than grep" speed, since if the query contains any labels, loki would already filter down to the relevant chunks.

Idea 1

Store arbitrary metadata alongside the line.

In order not to complicate the query language, I suggest that these stored-but-unindexed metadata fields look like labels with a special format: e.g. they start with a special symbol (e.g. _). Then you could do:

query {router="1.2.3.4", type="netflow", _src_ip=~"10\.10\.1\..*"} foo

In terms of the API, submitting records could be done the same way. Unfortunately, it would defeat the grouping of records with same label set, making the API more verbose and making it harder for loki to group together records belonging to the same chunk. Example:

{
  "streams": [
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\",_src_ip=\"10.10.1.50\"}",
      "entries": [{ "ts": "2018-12-18T08:28:06.801064-04:00", "line": "...netflow record 1..." }]
    },
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\",_src_ip=\"10.10.1.84\"}",
      "entries": [{ "ts": "2018-12-18T08:28:06.801065-04:00", "line": "...netflow record 2..." }]
    },
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\",_src_ip=\"10.10.1.42\"}",
      "entries": [{ "ts": "2018-12-18T08:28:06.801066-04:00", "line": "...netflow record 3..." }]
    }
  ]
}

So in the API it might be better to include the metadata inline in each record:

{
  "streams": [
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\"}",
      "entries": [
        { "ts": "2018-12-18T08:28:06.801064-04:00", "_src_ip": "10.10.1.50", "line": "...netflow record 1..." },
        { "ts": "2018-12-18T08:28:06.801065-04:00", "_src_ip": "10.10.1.84", "line": "...netflow record 2..." },
        { "ts": "2018-12-18T08:28:06.801066-04:00", "_src_ip": "10.10.1.42", "line": "...netflow record 3..." }
      ]
    }
  ]
}

In the grafana UI you could have one switch for displaying labels, and one for displaying metadata.

Wild idea 2

The same benefits as the above could be achieved if the "line" were itself a JSON record: stored in native JSON, and parsed and queried dynamically like jq.

You can do this today, if you just write serialized JSON into loki as a string. Note: this sits very well with the ELK/Beats way of doing things.

{
  "streams": [
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\"}",
      "entries": [
        { "ts": "2018-12-18T08:28:06.801064-04:00", "line":"{\"_src_ip\": \"10.10.1.50\", \"record\": \"...netflow record 1...\"}" },
        { "ts": "2018-12-18T08:28:06.801065-04:00", "line":"{\"_src_ip\": \"10.10.1.84\", \"record\": \"...netflow record 2...\"}" },
        { "ts": "2018-12-18T08:28:06.801066-04:00", "line":"{\"_src_ip\": \"10.10.1.42\", \"record\": \"...netflow record 3...\"}" }
      ]
    }
  ]
}

The only thing you can't do today is the server-side filtering of queries.

Therefore, you'd need to extend the query language to allow querying on fields of the JSON "line", in the main query part outside the label set. I think that using jq's language could be the ideal solution: the world does not need another query language, and it would allow querying down to arbitrary levels of nesting.

As an incremental improvement, you could then extend the API to natively accept a JSON object, handling the serialization internally:

{
  "streams": [
    {
      "labels": "{type=\"netflow\",router=\"1.2.3.4\"}",
      "entries": [
        { "ts": "2018-12-18T08:28:06.801064-04:00", "line":{"_src_ip": "10.10.1.50", "record": "...netflow record 1..."} },
        { "ts": "2018-12-18T08:28:06.801065-04:00", "line":{"_src_ip": "10.10.1.84", "record": "...netflow record 2..."} },
        { "ts": "2018-12-18T08:28:06.801066-04:00", "line":{"_src_ip": "10.10.1.42", "record": "...netflow record 3..."} }
      ]
    }
  ]
}

On reading records back out via the API, it could be optional whether loki returns you the raw line inside a string, or (for valid JSON) gives you the object. There are certainly cases where you'd want to retain the string version, e.g. for logcli.

Storing a 1-bit flag against each line, saying whether it's valid JSON or not, could be a useful optimisation.

The big benefits of this approach are (a) no changes to underlying data model; (b) you gain much more power dealing with log data which is already natively JSON; (c) you gain the ability to filter on numeric fields, with numeric comparison operators like < and > (where 2 < 10, unlike "2" > "10")


Aside: for querying IP address ranges, regexps aren't great. One option would be to enhance the query language with IP-address specific operators. Another one is to convert IP addresses to fixed-width hexadecimal (e.g. 10.0.0.1 = "0a000001") before storing them in metadata fields; prefix operations can be converted to regexp mechanically.

e.g. 10.0.0.0/22 => 0a000[0123].* or better 0a000[0123][0-9a-f]{2}

@kiney
Copy link

kiney commented Sep 4, 2019

I like both ideas, especially the second one. For the query language i think JMESPath would make more sense than the jq language as it is already supported for log processing in scrape config:
https://github.com/grafana/loki/blob/master/docs/logentry/processing-log-lines.md

I though about extracting some Info there and setting it as label. What would actually happen when creating high cardinality labels that way? "just" a very big index or would I run into real problems?

@candlerb
Copy link
Contributor

candlerb commented Sep 4, 2019

What would actually happen when creating high cardinality labels that way? "just" a very big index or would I run into real problems?

You could end up with Loki trying to create, for example, 4 billion different timeseries, each containing one log line. Since a certain amount of metadata is required for each timeseries, and (I believe) each timeseries has its own chunks, it is likely to explode.

Also, trying to read back your logs will be incredibly inefficient, since it will have to merge those 4 billion timeseries back together for almost every query.

@candlerb
Copy link
Contributor

candlerb commented Sep 4, 2019

@kiney: I hadn't come across JMESPath before, and I agree it would make sense to use the engine which is already being used. We need something that can be used as a predicate to filter a stream of JSON records, and ideally with a CLI for testing.

It looks like JMESPath's jp is the JMESPath equivalent to jq. (NOTE: this is different to the JSON Plotter jp which is what "brew install jp" gives you)

jp doesn't seem to be able to filter a stream of JSON objects like jq does. It just takes the first JSON item from stdin and ignores everything else.

As a workaround, loki could read batches of lines into JSON lists [ ... ] and apply JMESPath to each batch. Then you could use JMESPath list filter projections, e.g.

[?foo=='bar']

It needs to work in batches to avoid reading the whole stream into RAM: therefore
functions which operate over all records, like max_by, would not work. But to be fair, I don't think jq can do this either, without the -s (slurp into list) option, or unless you write your own functions.

I think the most common use case for logs is to filter them and return the entire record for each one selected. It would be interesting to make a list of the most common predicates and see how they are written in both jq and jp. e.g.

  • Select records containing top-level element "foo"
  • Select records where top-level element "foo" is string "bar"
  • Select records where top-level element "foo" is string "bar" and element "baz" is "qux"
  • Select records where top-level element "foo" matches regexp ^ba[rz]$ (*)
  • Select records where top-level numeric element "baz" is greater than 200
    ... etc

(*) Note: regular expressions are supported by jq but not as far as I can tell by JMESPath

Just taking the last example:

jq 'select(.baz > 200)' <<EOS
{"foo":"bar"}
{"foo":"baz"}
{"foo":"bar","baz":123}
{"foo":"bar","baz":789}
{"foo":"baz","baz":456}
EOS

=>
{
  "foo": "bar",
  "baz": 789
}
{
  "foo": "baz",
  "baz": 456
}

Compare:

jp '[?baz > `200`]' <<EOS
[{"foo":"bar"},
{"foo":"baz"},
{"foo":"bar","baz":123},
{"foo":"bar","baz":789},
{"foo":"baz","baz":456}]
EOS

=>
[
  {
    "baz": 789,
    "foo": "bar"
  },
  {
    "baz": 456,
    "foo": "baz"
  }
]

Note the non-obvious requirement for backticks around literals. If you omit them, the expression fails:

jp '[?baz > 200]' <<EOS
[{"foo":"bar"},
{"foo":"baz"},
{"foo":"bar","baz":123},
{"foo":"bar","baz":789},
{"foo":"baz","baz":456}]
EOS

=>
SyntaxError: Invalid token: tNumber
[?baz > 200]
        ^

String literals can either be written in single quotes, or wrapped in backticks and double-quotes. I find it pretty ugly, but then again, so are complex jq queries.

Aside: If you decide to give jq a set of records as a JSON list like we just did with jp, then it's a bit more awkward:

jq 'map(select(.baz > 200))' <<EOS  # or use -s to slurp into a list
[{"foo":"bar"},
{"foo":"baz"},
{"foo":"bar","baz":123},
{"foo":"bar","baz":789},
{"foo":"baz","baz":456}]
EOS

=>
[
  {
    "foo": "bar",
    "baz": 789
  },
  {
    "foo": "baz",
    "baz": 456
  }
]

But you probably wouldn't do that anyway with jq.

@candlerb
Copy link
Contributor

candlerb commented Sep 4, 2019

There's another way jq could be used: just for predicate testing.

jq '.baz > 200' <<EOS
{"foo":"bar"}
{"foo":"baz"}
{"foo":"bar","baz":123}
{"foo":"bar","baz":789}
{"foo":"baz","baz":456}
EOS

=>
false
false
false
true
true

If loki sees the value 'false' it could drop the record, and the value 'true' it passes the original record through unchanged. Any other value would be passed through. This would allow the most common queries to be simplified.

@kiney
Copy link

kiney commented Sep 4, 2019

@candlerb I also consider jq a slightly more pleasant language but JMESPath is already used in Loki and also the standard in other cloud tooling (aws cli, azure cli, ansible...).
I thought about using it similiar to current regex support: apply on each line, filter out lines with no match and highlight matched part in the remaining lines.

@cyriltovena cyriltovena added the keepalive An issue or PR that will be kept alive and never marked as stale. label Sep 5, 2019
@sandstrom
Copy link
Author

sandstrom commented May 19, 2020

Updates

  • This design proposal doesn't directly address high-cardinality labels, but may solve the same underlying problem.

    For some use-cases it would be equal to high-cardinality labels, as long as the LogQL queries can run fast enough on large log

  • There is some discussion in this thread, on making it easier to find unique IDs, such as trace-id and request-id

EDIT: copied to top comment

cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
* Squashed 'tools/' changes from b783528..1fe184f

1fe184f Bazel rules for building gogo protobufs (grafana#123)
b917bb8 Merge pull request grafana#122 from weaveworks/fix-scope-gc
c029ce0 Add regex to match scope VMs
0d4824b Merge pull request grafana#121 from weaveworks/provisioning-readme-terraform
5a82d64 Move terraform instructions to tf section
d285d78 Merge pull request grafana#120 from weaveworks/gocyclo-return-value
76b94a4 Do not spawn subshell when reading cyclo output
93b3c0d Use golang:1.9.2-stretch image
d40728f Gocyclo should return error code if issues detected
c4ac1c3 Merge pull request grafana#114 from weaveworks/tune-spell-check
8980656 Only check files
12ebc73 Don't spell-check pki files
578904a Special-case spell-check the same way we do code checks
e772ed5 Special-case on mime type and extension using just patterns
ae82b50 Merge pull request grafana#117 from weaveworks/test-verbose
8943473 Propagate verbose flag to 'go test'.
7c79b43 Merge pull request grafana#113 from weaveworks/update-shfmt-instructions
258ef01 Merge pull request grafana#115 from weaveworks/extra-linting
e690202 Use tools in built image to lint itself
126eb56 Add shellcheck to bring linting in line with scope
63ad68f Don't run lint on files under .git
51d908a Update shfmt instructions
e91cb0d Merge pull request grafana#112 from weaveworks/add-python-lint-tools
0c87554 Add yapf and flake8 to golang build image
35679ee Merge pull request grafana#110 from weaveworks/parallel-push-errors
3ae41b6 Remove unneeded if block
51ff31a Exit on first error
0faad9f Check for errors when pushing images in parallel
74dc626 Merge pull request grafana#108 from weaveworks/disable-apt-daily
b4f1d91 Merge pull request grafana#107 from weaveworks/docker-17-update
7436aa1 Override apt daily job to not run immediately on boot
7980f15 Merge pull request grafana#106 from weaveworks/document-docker-install-role
f741e53 Bump to Docker 17.06 from CE repo
61796a1 Update Docker CE Debian repo details
0d86f5e Allow for Docker package to be named docker-ce
065c68d Document selection of Docker installation role.
3809053 Just --porcelain; it defaults to v1
11400ea Merge pull request grafana#105 from weaveworks/remove-weaveplugin-remnants
b8b4d64 remove weaveplugin remnants
35099c9 Merge pull request grafana#104 from weaveworks/pull-docker-py
cdd48fc Pull docker-py to speed tests/builds up.
e1c6c24 Merge pull request grafana#103 from weaveworks/test-build-tags
d5d71e0 Add -tags option so callers can pass in build tags
8949b2b Merge pull request grafana#98 from weaveworks/git-status-tag
ac30687 Merge pull request grafana#100 from weaveworks/python_linting
4b125b5 Pin yapf & flake8 versions
7efb485 Lint python linting function
444755b Swap diff direction to reflect changes required
c5b2434 Install flake8 & yapf
5600eac Lint python in build-tools repo
0b02ca9 Add python linting
c011c0d Merge pull request grafana#79 from kinvolk/schu/python-shebang
6577d07 Merge pull request grafana#99 from weaveworks/shfmt-version
00ce0dc Use git status instead of diff to add 'WIP' tag
411fd13 Use shfmt v1.3.0 instead of latest from master.
0d6d4da Run shfmt 1.3 on the code.
5cdba32 Add sudo
c322ca8 circle.yml: Install shfmt binary.
e59c225 Install shfmt 1.3 binary.
30706e6 Install pyhcl in the build container.
960d222 Merge pull request grafana#97 from kinvolk/alban/update-shfmt-3
1d535c7 shellcheck: fix escaping issue
5542498 Merge pull request grafana#96 from kinvolk/alban/update-shfmt-2
32f7cc5 shfmt: fix coding style
09f72af lint: print the diff in case of error
571c7d7 Merge pull request grafana#95 from kinvolk/alban/update-shfmt
bead6ed Update for latest shfmt
b08dc4d Update for latest shfmt (grafana#94)
2ed8aaa Add no-race argument to test script (grafana#92)
80dd78e Merge pull request grafana#91 from weaveworks/upgrade-go-1.8.1
08dcd0d Please ./lint as shfmt changed its rules between 1.0.0 and 1.3.0.
a8bc9ab Upgrade default Go version to 1.8.1.
41c5622 Merge pull request grafana#90 from weaveworks/build-golang-service-conf
e8ebdd5 broaden imagetag regex to fix haskell build image
ba3fbfa Merge pull request grafana#89 from weaveworks/build-golang-service-conf
e506f1b Fix up test script for updated shfmt
9216db8 Add stuff for service-conf build to build-goland image
66a9a93 Merge pull request grafana#88 from weaveworks/haskell-image
cb3e3a2 shfmt
74a5239 Haskell build image
4ccd42b Trying circle quay login
b2c295f Merge branch 'common-build'
0ac746f Trim quay prefix in circle script
c405b31 Merge pull request grafana#87 from weaveworks/common-build
9672d7c Push build images to quay as they have sane robot accounts
a2bf112 Review feedback
fef9b7d Add protobuf tools
10a77ea Update readme
254f266 Don't need the image name in
ffb59fc Adding a weaveworks/build-golang image with tags
b817368 Update min Weave Net docker version
cf87ca3 Merge pull request grafana#86 from weaveworks/lock-kubeadm-version
3ae6919 Add example of custom SSH private key to tf_ssh's usage.
cf8bd8a Add example of custom SSH private key to tf_ansi's usage.
c7d3370 Lock kubeadm's Kubernetes version.
faaaa6f Merge pull request grafana#84 from weaveworks/centos-rhel
ef552e7 Select weave-kube YAML URL based on K8S version.
b4c1198 Upgrade default kubernetes_version to 1.6.1.
b82805e Use a fixed version of kubeadm.
f33888b Factorise and make kubeconfig option optional.
f7b8b89 Install EPEL repo for CentOS.
615917a Fix error in decrypting AWS access key and secret.
86f97b4 Add CentOS 7 AMI and username for AWS via Terraform.
eafd810 Add tf_ansi example with Ansible variables.
2b05787 Skip setup of Docker over TCP for CentOS/RHEL.
84c420b Add docker-ce role for CentOS/RHEL.
00a820c Add setup_weave-net_debug.yml playbook for user issues' debugging.
3eae480 Upgrade default kubernetes_version to 1.5.4.
753921c Allow injection of Docker installation role.
e1ff90d Fix kubectl taint command for 1.5.
b989e97 Fix typo in kubectl taint for single node K8S cluster.
541f58d Remove 'install_recommends: no' for ethtool.
c3f9711 Make Ansible role docker-from-get.docker.com work on RHEL/CentOS.
038c0ae Add frequently used OS images, for convenience.
d30649f Add --insecure-registry to docker.conf
1dd9218 shfmt -i 4 -w push-images
6de96ac Add option to not push docker hub images
310f53d Add push-images script from cortex
8641381 Add port 6443 to kubeadm join commands for K8S 1.6+.
50bf0bc Force type of K8S token to string.
08ab1c0 Remove trailing whitespaces.
ae9efb8 Enable testing against K8S release candidates.
9e32194 Secure GCP servers for Scope: open port 80.
a22536a Secure GCP servers for Scope.
89c3a29 Merge pull request grafana#78 from weaveworks/lint-merge-rebase-issue-in-docs
73ad56d Add linter function to avoid bad merge/rebase artefact
31d069d Change Python shebang to `#!/usr/bin/env python`
52d695c Merge pull request grafana#77 from kinvolk/schu/fix-relative-weave-path
77aed01 Merge pull request grafana#73 from weaveworks/mike/sched/fix-unicode-issue
7c080f4 integration/sanity_check: disable SC1090
d6d360a integration/gce.sh: update gcloud command
e8def2c provisioning/setup: fix shellcheck SC2140
cc02224 integration/config: fix weave path
9c0d6a5 Fix config_management/README.md
334708c Merge pull request grafana#75 from kinvolk/alban/external-build-1
da2505d gce.sh: template: print creation date
e676854 integration tests: fix user account
8530836 host nameing: add repo name
b556c0a gce.sh: fix deletion of gce instances
2ecd1c2 integration: fix GCE --zones/--zone parameter
3e863df sched: Fix unicode encoding issues
51785b5 Use rm -f and set current dir using BASH_SOURCE.
f5c6d68 Merge pull request grafana#71 from kinvolk/schu/fix-linter-warnings
0269628 Document requirement for `lint_sh`
9a3f09e Fix linter warnings
efcf9d2 Merge pull request grafana#53 from weaveworks/2647-testing-mvp
d31ea57 Weave Kube playbook now works with multiple nodes.
27868dd Add GCP firewall rule for FastDP crypto.
edc8bb3 Differentiated name of dev and test playbooks, to avoid confusion.
efa3df7 Moved utility Ansible Yaml to library directory.
fcd2769 Add shorthands to run Ansible playbooks against Terraform-provisioned virtual machines.
f7946fb Add shorthands to SSH into Terraform-provisioned virtual machines.
aad5c6f Mention Terraform and Ansible in README.md.
dddabf0 Add Terraform output required for templates' creation.
dcc7d02 Add Ansible configuration playbooks for development environments.
f86481c Add Ansible configuration playbooks for Docker, K8S and Weave-Net.
efedd25 Git-ignore Ansible retry files.
765c4ca Add helper functions to setup Terraform programmatically.
801dd1d Add Terraform cloud provisioning scripts.
b8017e1 Install hclfmt on CircleCI.
4815e19 Git-ignore Terraform state files.
0aaebc7 Add script to generate cartesian product of dependencies of cross-version testing.
007d90a Add script to list OS images from GCP, AWS and DO.
ca65cc0 Add script to list relevant versions of Go, Docker and Kubernetes.
aa66f44 Scripts now source dependencies using absolute path (previously breaking make depending on current directory).
7865e86 Add -p option to parallelise lint.
36c1835 Merge pull request grafana#69 from weaveworks/mflag
9857568 Use mflag and mflagext package from weaveworks/common.
9799112 Quote bash variable.
10a36b3 Merge pull request grafana#67 from weaveworks/shfmt-ignore
a59884f Add support for .lintignore.
03cc598 Don't lint generated protobuf code.
2b55c2d Merge pull request grafana#66 from weaveworks/reduce-test-timeout
d4e163c Make timeout a flag
49a8609 Reduce test timeout
8fa15cb Merge pull request grafana#63 from weaveworks/test-defaults

git-subtree-dir: tools
git-subtree-split: 1fe184f1f5330c4444c4377bef84f2d30e7dc7fe

* Use keyed fields in composite literal

* Squashed 'tools/' changes from 1fe184f..ccc8316

ccc8316 Revert "Gocyclo should return error code if issues detected" (grafana#124)

git-subtree-dir: tools
git-subtree-split: ccc831682b5d51e068b17fe9ad482f025abd1fbb
periklis pushed a commit to periklis/loki that referenced this issue Dec 12, 2022
[release-5.6] Add release channel stable-5.6
@feldentm-SAP
Copy link

This also affects handling of OpenTelemetry attributes and TraceContext especially if the set of attributes is not trivial. If you have thousands of service instances in your Loki and you want to do queries like "give me all lines that have a certain action attribute set and failed" you cannot restrict that query on instances. Restricting it by time is also an issue because you might miss some interesting events. Not having it is an issue in detecting root causes of sporadic defects.

@lsampras
Copy link
Contributor

lsampras commented Jul 4, 2023

Are there any further updates on this issue..

I see some solutions being built for trace-id use-case...

But is there a plan for supporting user-defined high-cardinality labels over a large time span...

@candlerb
Copy link
Contributor

candlerb commented Jul 4, 2023

I think you can do what you need using the LogQL json filter. This extracts "labels" from the JSON body, but they are not logstream labels, so they don't affect cardinality.

These are not indexed, so it will perform a linear search across the whole timespan of interest. If you scale up Loki then it might be able to parallelize this across chunks - I'm not sure.

But if you want fully indexed searching, then I believe Loki isn't for you. You can either:

  • throw a ton of resources at the problem with Elasticsearch (tons of RAM, tons of SSD, expect your log volumes to increase by a factor of 10); or
  • dump all your logs into one of the "big data" products (hive, druid, kudu, drill, clickhouse etc), possibly using an on-disk format like parquet optimised for columnar searching; or
  • dip your toe in the water with the recently-announced VictoriaLogs. This looks like it hits a sweet spot with word splitting and indexing, but is not yet considered production-ready. Maybe Loki will consider adding something similar in future.

@sandstrom
Copy link
Author

For others interested in this, this is the most recent "statement" I've seen from the Loki team:

#1282 (comment)

@lsampras
Copy link
Contributor

lsampras commented Jul 4, 2023

But if you want fully indexed searching, then I believe Loki isn't for you.

Yeah,
Creating a complete index would be out-of-scope and against the design of Loki

But are we considering some middle ground
for e.g

  1. the ability to define certain labels for which chunks would be bloom filtered instead of separate streams
  2. letting users specify chunk id's in the loki query request so that they can build/maintain their own indexing etc.
  3. we've also had some ideas around adding a co-processor

I wanted to gauge what in this case would we be willing to extend in loki to support this (without going against its core design principles)

@valyala
Copy link

valyala commented Jul 13, 2023

Loki can implement support for high-cardinality labels in the way similar to VictoriaLogs - it just doesn't associate these labels with log streams. Instead, it stores them inside the stream - see storage architecture details. This allows resolving the following issues:

  • High RAM usage, OOM crashes, high number of open files, significant slowdown
  • Slow query performance when high-cardinality labels are stored in the log message itself.

The high-cardinality labels can then be queried with label_name:value filters - see these docs for details.

@sandstrom
Copy link
Author

sandstrom commented Jan 12, 2024

Time flies!

Issue number 91 might finally have a nice solution! 🎉

I don't want to say never, but I'm not sure if this is something we're ever likely to support.

@tomwilkie I'm happy it wasn't never 😄

Six years later, there is a solution on the horizon:

(not GA just yet though)

@feldentm-SAP
Copy link

Hi @sandstrom,

please inform us and your support organization, once this is GA.
Also, please make sure that the GA version will allow ingesting data that has the structured-metadata already irrespective of the kind of log source. For OTLP, there should be means of configuring what to keep or transforming the log into whatever your internal format is going to be.

Kind regards and thanks for the great contribution

@sandstrom
Copy link
Author

A quick update for people following this. Loki 3.0 has shipped with experimental bloom filters, that are addressing exactly this issue. This is awesome news! 🎉

https://grafana.com/blog/2024/04/09/grafana-loki-3.0-release-all-the-new-features/


I'll keep this open until it's no longer experimental, which is probably in 3.1 I'd guess, unless a team member of Loki wants to close this out already, since it has shipped (albeit in an experimental state) -- I'm fine with either.

@chaudum
Copy link
Contributor

chaudum commented Apr 29, 2024

I'll keep this open until it's no longer experimental, which is probably in 3.1 I'd guess, unless a team member of Loki wants to close this out already, since it has shipped (albeit in an experimental state) -- I'm fine with either.

Sounds good to me 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/loki feature/blooms keepalive An issue or PR that will be kept alive and never marked as stale. type/feature Something new we should do
Projects
None yet
Development

No branches or pull requests

10 participants