Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cumulative sum expression function #80129

Merged
merged 11 commits into from
Oct 20, 2020

Conversation

flash1293
Copy link
Contributor

@flash1293 flash1293 commented Oct 12, 2020

Summary

Related to #61776

This PR adds a "cumulative sum" expression function. It's intended to be used as part of the upcoming Lens operation, but can also be used within Canvas as a standalone function.

Example:

filters
| demodata
| cumulative_sum inputColumnId="price" outputColumnId="cumulative_price" by="state"
| table
| render

This PR doesn't add public documentation as part of the Canvas function reference - we have to decide whether we want to advertise the function like this.

The exact behavior (especially around edge cases) is described as a comment of the function implementation - this can serve as a basis for the public documentation.
https://github.com/elastic/kibana/pull/80129/files#diff-8b017bbbf9f7d6623f47dbc16f422007R40

Checklist

@flash1293 flash1293 added enhancement New value added to drive a business result v7.11.0 v8.0.0 labels Oct 12, 2020
@flash1293 flash1293 marked this pull request as ready for review October 12, 2020 12:52
@flash1293 flash1293 requested a review from a team as a code owner October 12, 2020 12:52
@flash1293 flash1293 added Team:AppArch Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Oct 12, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-arch (Team:AppArch)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@flash1293 flash1293 added release_note:enhancement and removed enhancement New value added to drive a business result labels Oct 12, 2020
newRow[column] = Number(newRow[column]) + accumulatorValue;
accumulators[bucketIdentifier] = newRow[column];
} else {
newRow[column] = accumulatorValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this again, it does seem a bit weird that the comulative sum would overwrite the current value, or at least that being the only option.

could we add another argument, name which would be the name of the new column ? (what map does)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and i guess (what do you think @lukeelmers ) we can still replace the column if no name is provided ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but didn't implement eventually because you can put that together quite easily yourself using mapColumn like in the example in the description (trying to keep the function as single-purpose as possible). I'm fine with adding it if you think it's the right thing though, in that case what about input and output arguments (name/column isn't really descriptive anymore)

Copy link
Contributor Author

@flash1293 flash1293 Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about it again, what about the following API?

  • name or unnamed arg are the target column and a required argument.
  • from is an optional argument and references the column to calculate the cumulative sum for - if it's not set, it defaults to name

Then you can do these:

demodata | cumulative_sum "price"
demodata | cumulative_sum "price" by="state"
demodata | cumulative_sum "cumulative_price" from="price" by="state"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can't replace the column in Lens because the same inner column could be used for multiple metrics. Imagine that the user has configured Count and Cumulative sum of count- we need to use the original column, and create a copy.

  2. There are no existing expression functions which are able to create a column with the fieldFormatter params, or that can "copy" an existing column

For both these reasons, my preference is to use the API we last discussed here: #61775 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ to not overwriting the column (which it sounds like we can't do anyway)

There are no existing expression functions which are able to create a column with the fieldFormatter params

As an aside: I didn't realize mapColumn doesn't copy the full meta, but perhaps it should be provided as an option there as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we ever want to change formatter (looking at the issue linked above)? i think that's not a good idea. I think the resulting column should keep the same format information as original column had (why would doing a sum/derivative/ some other calculation affect the format ?)

also thinking about it again, seems overriding existing column:

  • can't be done for various reasons where we need source column later
  • could be an edge case, but we can also achieve this by just dropping original column later, so i think we should leave the option to replace the column out, which makes target column id/name required parameters (if name is not provided we can use name=id)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we ever want to change formatter (looking at the issue linked above)?

I'm not sure about the use case for changing the formatter either, but it does feel like if we are making any expression functions that copy columns (currently just mapColumn) we should at least be preserving the full meta, or as much of it as makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely see your point @lukeelmers, but mapColumn is not necessarily mapping just a single column, it's acting on the whole row. Maybe a separate function would be justified. I will create a separate issue for discussing this (IMHO it doesn't block moving forward with this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ Yeah it shouldn't block moving forward with this, but point taken on it acting on a whole row.

@botelastic botelastic bot added the Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) label Oct 12, 2020
Copy link
Contributor

@wylieconlon wylieconlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't done a full review of this, but wanted to leave my API feedback first.

newRow[column] = Number(newRow[column]) + accumulatorValue;
accumulators[bucketIdentifier] = newRow[column];
} else {
newRow[column] = accumulatorValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can't replace the column in Lens because the same inner column could be used for multiple metrics. Imagine that the user has configured Count and Cumulative sum of count- we need to use the original column, and create a copy.

  2. There are no existing expression functions which are able to create a column with the fieldFormatter params, or that can "copy" an existing column

For both these reasons, my preference is to use the API we last discussed here: #61775 (comment)

@flash1293
Copy link
Contributor Author

Thanks for pointing me to the derivative API discussion @wylieconlon - I will implement that, it looks like it nicely covers all cases.

@flash1293
Copy link
Contributor Author

Adjusted the behavior:

  • Separate output column is required
  • Some error handling
  • Copies over meta data (no option to overwrite formatter information)

Copy link
Contributor

@wylieconlon wylieconlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the tests here very difficult to verify, so I submitted a PR to try to simplify them. Also did some performance testing using the expression debug mode using Canvas, and a dataset of 10k rows in Canvas. Baseline results with 10k rows was that this function executed in 16ms without grouping, and 19ms with a single grouping. Compared to mapColumn with a fixed string value on the same 10k rows, which executes in 3800ms, this is a huge improvement.

},
{ inputColumnId: 'val', outputColumnId: 'output' }
)
).toMatchInlineSnapshot(`
Copy link
Contributor

@wylieconlon wylieconlon Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found these snapshot-based tests very difficult to glance through and to match up the input vs output expected values. In particular, the snapshots are asserting some extra information that I don't think we require tests for, like testing that the previous rows and columns are kept.

What do you think about changing the test style to the shortest-possible code?

    expect(result.columns).toContainEqual({
      id: 'output',
      name: 'output',
      meta: { type: 'number' },
    });
    expect(result.rows.map((row) => row.output)).toEqual([5, 12, 15, 17]);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I was thinking the same thing, would be a bit easier to read if we maybe had just one snapshot test, but asserted the specific values on the others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s a great idea @wylieconlon , feel free to push directly to this PR, if it’s not blocking any other efforts I can also clean it up on Friday.

import { ExecutionContext } from '../../../execution/types';
import { Datatable } from '../../../expression_types/specs/datatable';

describe('interpreter/functions#cumulative_sum', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a missing test case where the initial values are null, by reading through the elasticsearch test cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

},
{ inputColumnId: 'val', outputColumnId: 'output' }
)
).toMatchInlineSnapshot(`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I was thinking the same thing, would be a bit easier to read if we maybe had just one snapshot test, but asserted the specific values on the others.

@flash1293
Copy link
Contributor Author

@elasticmachine merge upstream

@flash1293
Copy link
Contributor Author

Thanks for the review and for the test improvements! I cleaned up the rest of the comments.
@lukeelmers Could you verify the exporting etc is done right?

@flash1293
Copy link
Contributor Author

@elasticmachine merge upstream

Copy link
Member

@lukeelmers lukeelmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code updates LGTM, pending a green build

@flash1293
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

@kbn/optimizer bundle module count

id before after diff
expressions 104 105 +1

distributable file count

id before after diff
default 48035 48036 +1
oss 28563 28564 +1

page load bundle size

id before after diff
expressions 196.4KB 201.7KB +5.4KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@flash1293
Copy link
Contributor Author

@wylieconlon Do you want to have another look at this?

Copy link
Contributor

@wylieconlon wylieconlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@flash1293 flash1293 merged commit e1bd1e8 into elastic:master Oct 20, 2020
flash1293 added a commit to flash1293/kibana that referenced this pull request Oct 20, 2020
jloleysens added a commit to jloleysens/kibana that referenced this pull request Oct 20, 2020
…lout-for-warm-and-cold-tier

* 'master' of github.com:elastic/kibana: (126 commits)
  Add cumulative sum expression function (elastic#80129)
  [APM] Fix link to trace (elastic#80993)
  Provide url rewritten in onPreRouting interceptor (elastic#80810)
  limit renovate to npm packages
  Fix bug in logs UI link (elastic#80943)
  [Monitoring] Fix bug with setup mode appearing on pages it shouldn't (elastic#80343)
  [Security Solution][Detection Engine] Fixes false positives caused by empty records in threat list
  docs test (elastic#81080)
  Fixed alerts ui test timeout issue, related to the multiple server calls for delete all alerts, by reducing the number of alerts to the two and increasing retry timeout. (elastic#81067)
  [APM] Fix service map highlighted edge on node select (elastic#80791)
  Fix typo in toast, slight copy adjustment. (elastic#80843)
  [Security Solution] reduce optimizer limits (elastic#80997)
  [maps] 7.10 documentation updates (elastic#79917)
  [Workplace Search] Fix Group Prioritization route and clean up design (elastic#80903)
  [Enterprise Search] Added reusable HiddenText component to Credentials (elastic#80033)
  Upgrade EUI to v29.5.0 (elastic#80753)
  [Maps] Fix layer-flash when changing style (elastic#80948)
  [Security Solution] [Detections] Disable edit button when user does not have actions privileges w/ rule + actions (elastic#80220)
  [Enterprise Search] Handle loading state on Credentials page (elastic#80035)
  [Monitoring] Fix cluster listing page in how it handles global state (elastic#78979)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) release_note:enhancement Team:Visualizations Visualization editors, elastic-charts and infrastructure v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants