Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduced named JQ filters to cover most common filters and simplify the definition of the weaver.yaml config file. #211

Closed
lquerel opened this issue Jun 13, 2024 · 8 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@lquerel
Copy link
Contributor

lquerel commented Jun 13, 2024

Motivation

JQ is a very powerful language for expressing filtering, projections, sorting, and grouping, but the learning curve to master it is significant. In most scenarios, documentation and code authors will reuse existing JQ expressions, and the definition of new JQ filters will be uncommon and reserved for advanced Weaver developers.

Proposal

We propose to introduce named JQ expressions that can be directly referenced in the weaver.yaml configuration files. A catalog of the most common JQ expressions will also be defined in an extensible way.

Currently, we have complex JQ expression like these ones:

templates:
    # Templates for the `attribute_group` group type
  - pattern: attributes/mod.rs.j2
    # The following JQ filter extracts the id, type, brief, and prefix of groups matching the following criteria:
    # - groups with an id starting with the prefix `registry.`
    # - groups of the type `attribute_group`.
    # - groups are deduplicated by namespace.
    # - groups are sorted by namespace.
    filter: >
      if $attributes then
        .groups
        | map(select(.id | startswith($registry_prefix)))
        | map(select(.type == "attribute_group")    
          | { 
            id, 
            type, 
            brief, 
            prefix})
        | unique_by(.id | split(".") | .[1])
        | sort_by(.id | split(".") | .[1])
      else
        empty
      end
    application_mode: single
  - pattern: attributes/attributes.rs.j2
    # The following JQ filter extracts the id, type, brief, prefix, and attributes of groups matching the following
    # criteria:
    # - groups with an id starting with the prefix `registry.`
    # - groups of the type `attribute_group`.
    # - groups are sorted by namespace.
    filter: >
      if $attributes then
        .groups 
        | map(select(.id | startswith($registry_prefix))) 
        | map(select(.type == "attribute_group") 
          | { 
            id, 
            type, 
            brief, 
            prefix, 
            attributes}) 
        | group_by(.id | split(".") | .[1]) 
        | map({
            id: (map(select(.id | endswith(".deprecated") | not)) | first).id,
            type: (map(select(.id | endswith(".deprecated") | not)) | first).type,
            brief: (map(select(.id | endswith(".deprecated") | not)) | first).brief,
            prefix: (map(select(.id | endswith(".deprecated") | not)) | first).prefix,
            attributes: map(.attributes) | add
          })
        | sort_by(.id | split(".") | .[1])
      else
        empty
      end
    application_mode: each

    # Templates for the `metric` group type
  - pattern: metrics/mod.rs.j2
    # The following JQ filter extracts the id, type, brief, and prefix of groups matching the following criteria:
    # - groups with an id starting with the prefix `metric.`
    # - groups of the type `metric`.
    # - groups are deduplicated by namespace.
    # - groups are sorted by prefix.
    filter: >
      if $metrics then
        .groups
        | map(select(.id | startswith("metric."))) 
        | map(select(.type == "metric")    
          | { 
            id, 
            type, 
            brief, 
            prefix})
        | unique_by(.id | split(".") | .[1])
        | sort_by(.id | split(".") | .[1])
      else
        empty
      end
    application_mode: single
  - pattern: metrics/metrics.rs.j2
    # The following JQ filter extracts the id, type, brief, prefix, and attributes of groups matching the following
    # criteria:
    # - groups with an id starting with the prefix `metric.`
    # - groups of the type `metric`.
    # - groups are sorted by namespace.
    filter: >
      if $metrics then
        .groups 
        | map(select(.id | startswith("metric."))) 
        | group_by(.id | split(".") | .[1]) 
        | map({ 
          prefix: .[0].id | split(".") | .[1],
          groups: .   
        })
      else
        empty
      end
    application_mode: each

We could create a catalog of the most common JQ expressions and assign a name to each entry (see below). A default catalog will be provided by Weaver, but users should also be able to extend it with their own filters. More details on the hierarchical definition of the Weaver configuration in #230.

filter_catalog:
  extract_registry_attribute_groups:
    type: jq
    description: >
      Extracts the id, type, brief, and prefix of groups matching the following criteria:
      - groups with an id starting with the prefix `registry.`
      - groups of the type `attribute_group`.
      - groups are deduplicated by namespace.
      - groups are sorted by namespace.
    expression: >
      if $attributes then
        .groups
        | map(select(.id | startswith($registry_prefix)))
        | map(select(.type == "attribute_group")    
          | { 
            id, 
            type, 
            brief, 
            prefix})
        | unique_by(.id | split(".") | .[1])
        | sort_by(.id | split(".") | .[1])
      else
        empty
      end

  extract_registry_attribute_groups_with_attributes:
    type: jq
    description: >
      Extracts the id, type, brief, prefix, and attributes of groups matching the following criteria:
      - groups with an id starting with the prefix `registry.`
      - groups of the type `attribute_group`.
      - groups are sorted by namespace.
    expression: >
      if $attributes then
        .groups 
        | map(select(.id | startswith($registry_prefix))) 
        | map(select(.type == "attribute_group") 
          | { 
            id, 
            type, 
            brief, 
            prefix, 
            attributes}) 
        | group_by(.id | split(".") | .[1]) 
        | map({
            id: (map(select(.id | endswith(".deprecated") | not)) | first).id,
            type: (map(select(.id | endswith(".deprecated") | not)) | first).type,
            brief: (map(select(.id | endswith(".deprecated") | not)) | first).brief,
            prefix: (map(select(.id | endswith(".deprecated") | not)) | first).prefix,
            attributes: map(.attributes) | add
          })
        | sort_by(.id | split(".") | .[1])
      else
        empty
      end

  extract_metric_groups:
    type: jq
    description: >
      Extracts the id, type, brief, and prefix of groups matching the following criteria:
      - groups with an id starting with the prefix `metric.`
      - groups of the type `metric`.
      - groups are deduplicated by namespace.
      - groups are sorted by prefix.
    expression: >
      if $metrics then
        .groups
        | map(select(.id | startswith("metric."))) 
        | map(select(.type == "metric")    
          | { 
            id, 
            type, 
            brief, 
            prefix})
        | unique_by(.id | split(".") | .[1])
        | sort_by(.id | split(".") | .[1])
      else
        empty
      end

  extract_metric_groups_with_prefix:
    type: jq
    description: >
      Extracts the id, type, brief, prefix, and attributes of groups matching the following criteria:
      - groups with an id starting with the prefix `metric.`
      - groups of the type `metric`.
      - groups are sorted by namespace.
    expression: >
      if $metrics then
        .groups 
        | map(select(.id | startswith("metric."))) 
        | group_by(.id | split(".") | .[1]) 
        | map({ 
          prefix: .[0].id | split(".") | .[1],
          groups: .   
        })
      else
        empty
      end

With this catalog, the weaver.yaml configuration file will be:

templates:
  # Templates for the `attribute_group` group type
  - pattern: attributes/mod.rs.j2
    named_filter: extract_registry_attribute_groups
    application_mode: single

  - pattern: attributes/attributes.rs.j2
    named_filter: extract_registry_attribute_groups_with_attributes
    application_mode: each

  # Templates for the `metric` group type
  - pattern: metrics/mod.rs.j2
    named_filter: extract_metric_groups
    application_mode: single

  - pattern: metrics/metrics.rs.j2
    named_filter: extract_metric_groups_with_prefix
    application_mode: each

Important note: The exact content of the predefined JQ filters catalog is not yet in its final state.

Related Issues

@haidong
Copy link
Contributor

haidong commented Jul 6, 2024

I'd like to start working on this. I'll keep you posted about my progress in the next few days.

@lquerel
Copy link
Contributor Author

lquerel commented Jul 6, 2024

I'd like to start working on this. I'll keep you posted about my progress in the next few days.

Thanks! For your information, I am currently working on #230 . Please act as if it doesn’t exist, and we’ll find a way to merge the two PRs.

@lmolkova
Copy link
Contributor

lmolkova commented Jul 8, 2024

As an alternative, I'd like to suggest pre-populating data in both the JQ and jinja contexts.

I.e. ctx.grouped_attributes, ctx.grouped_metrics, ctx.grouped_enum_attributes - this was enough for existing codegen with build-tools, and might need more sugar over time.

I'd still be great to have some filters to filter based on stability which is dynamic, but that's a much smaller concern.

The benefit: people don't even need to write named_filter: extract_metric_groups, they just access the pre-massaged data in jinja.
Duplication/overprocessing/extra memory is really not an issue here - semantic conventions are tiny and this code is not on the hot path - it runs on dev machine every once in a while prior to submitting PR.

@lquerel
Copy link
Contributor Author

lquerel commented Jul 8, 2024

@lmolkova, @jsuereth I like the idea. We can make the definition of the JQ filter optional and generate a predefined version of the resolved registry that presents an organization of groups by signal types. We still need to define more precisely the exact organization of these groups, such as how they are sorted, whether we support certain filtering parameters, and so on. We can implement this without breaking what already exists.

@lquerel
Copy link
Contributor Author

lquerel commented Jul 8, 2024

@lmolkova, @jsuereth What about defining those groups as follow:

ctx.grouped_attributes:

if $attributes then
  .groups 
  | map(select(.id | startswith("registry."))) 
  | map(select(.type == "attribute_group") 
    | { 
      id, 
      type, 
      brief, 
      prefix, 
      attributes}) 
  | group_by(.id | split(".") | .[1]) 
  | sort_by(.id | split(".") | .[1])
else
  empty
end

ctx.grouped_metrics:

if $metrics then
  .groups
  | map(select(.type == "metric")    
    | { 
      id, 
      type, 
      brief, 
      prefix, 
      attributes})
  | unique_by(.id | split(".") | .[1])
  | sort_by(.id | split(".") | .[1])
else
  empty
end

$attributes and $metrics are parameters that could be used to disable the generation of those groups via the params section already in place.

Similar predefined groups could be created for spans.

@lmolkova I'm not sure what you mean by grouped_enum_attributes.

@lquerel lquerel moved this to In Progress in OTel Weaver Project Jul 10, 2024
@haidong
Copy link
Contributor

haidong commented Jul 10, 2024

I've been digging around looking for the right files in the right directory to change. I don't know where yet.

After thinking about both the original proposal and alternatives, I actually wonder if "doing nothing" at the moment is the right approach.

Meantime, I've noticed some typos and repetition in docs. I will submit a PR to fix that while mulling this over a bit more.

@lquerel
Copy link
Contributor Author

lquerel commented Jul 11, 2024

@haidong I agree, it's better to mature this idea first before implementing any alternatives.

@lquerel
Copy link
Contributor Author

lquerel commented Jul 22, 2024

Closed because we decided to follow a different approach, see #246

@lquerel lquerel closed this as completed Jul 22, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in OTel Weaver Project Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
Status: Done
Development

No branches or pull requests

3 participants