Add cumulative sum expression function #80129

flash1293 · 2020-10-12T10:58:18Z

Summary

Related to #61776

This PR adds a "cumulative sum" expression function. It's intended to be used as part of the upcoming Lens operation, but can also be used within Canvas as a standalone function.

Example:

filters
| demodata
| cumulative_sum inputColumnId="price" outputColumnId="cumulative_price" by="state"
| table
| render

This PR doesn't add public documentation as part of the Canvas function reference - we have to decide whether we want to advertise the function like this.

The exact behavior (especially around edge cases) is described as a comment of the function implementation - this can serve as a basis for the public documentation.
https://github.com/elastic/kibana/pull/80129/files#diff-8b017bbbf9f7d6623f47dbc16f422007R40

Checklist

Documentation was added for features that require explanation or tutorials
Unit or functional tests were updated or added to match the most common scenarios

elasticmachine · 2020-10-12T12:52:24Z

Pinging @elastic/kibana-app-arch (Team:AppArch)

elasticmachine · 2020-10-12T12:52:24Z

Pinging @elastic/kibana-app (Team:KibanaApp)

ppisljar · 2020-10-12T13:53:14Z

src/plugins/expressions/common/expression_functions/specs/cumulative_sum.ts

+          newRow[column] = Number(newRow[column]) + accumulatorValue;
+          accumulators[bucketIdentifier] = newRow[column];
+        } else {
+          newRow[column] = accumulatorValue;


thinking about this again, it does seem a bit weird that the comulative sum would overwrite the current value, or at least that being the only option.

could we add another argument, name which would be the name of the new column ? (what map does)

and i guess (what do you think @lukeelmers ) we can still replace the column if no name is provided ?

I thought about it but didn't implement eventually because you can put that together quite easily yourself using mapColumn like in the example in the description (trying to keep the function as single-purpose as possible). I'm fine with adding it if you think it's the right thing though, in that case what about input and output arguments (name/column isn't really descriptive anymore)

Actually, thinking about it again, what about the following API?

name or unnamed arg are the target column and a required argument.

from is an optional argument and references the column to calculate the cumulative sum for - if it's not set, it defaults to name

Then you can do these:

demodata | cumulative_sum "price" demodata | cumulative_sum "price" by="state" demodata | cumulative_sum "cumulative_price" from="price" by="state"

We can't replace the column in Lens because the same inner column could be used for multiple metrics. Imagine that the user has configured Count and Cumulative sum of count- we need to use the original column, and create a copy.

There are no existing expression functions which are able to create a column with the fieldFormatter params, or that can "copy" an existing column

For both these reasons, my preference is to use the API we last discussed here: #61775 (comment)

++ to not overwriting the column (which it sounds like we can't do anyway)

There are no existing expression functions which are able to create a column with the fieldFormatter params

As an aside: I didn't realize mapColumn doesn't copy the full meta, but perhaps it should be provided as an option there as well.

why would we ever want to change formatter (looking at the issue linked above)? i think that's not a good idea. I think the resulting column should keep the same format information as original column had (why would doing a sum/derivative/ some other calculation affect the format ?)

also thinking about it again, seems overriding existing column:

can't be done for various reasons where we need source column later

could be an edge case, but we can also achieve this by just dropping original column later, so i think we should leave the option to replace the column out, which makes target column id/name required parameters (if name is not provided we can use name=id)

why would we ever want to change formatter (looking at the issue linked above)?

I'm not sure about the use case for changing the formatter either, but it does feel like if we are making any expression functions that copy columns (currently just mapColumn) we should at least be preserving the full meta, or as much of it as makes sense.

I definitely see your point @lukeelmers, but mapColumn is not necessarily mapping just a single column, it's acting on the whole row. Maybe a separate function would be justified. I will create a separate issue for discussing this (IMHO it doesn't block moving forward with this PR).

++ Yeah it shouldn't block moving forward with this, but point taken on it acting on a whole row.

wylieconlon

Haven't done a full review of this, but wanted to leave my API feedback first.

src/plugins/expressions/common/expression_functions/specs/cumulative_sum.ts

wylieconlon · 2020-10-12T20:56:34Z

src/plugins/expressions/common/expression_functions/specs/cumulative_sum.ts

+          newRow[column] = Number(newRow[column]) + accumulatorValue;
+          accumulators[bucketIdentifier] = newRow[column];
+        } else {
+          newRow[column] = accumulatorValue;


We can't replace the column in Lens because the same inner column could be used for multiple metrics. Imagine that the user has configured Count and Cumulative sum of count- we need to use the original column, and create a copy.

There are no existing expression functions which are able to create a column with the fieldFormatter params, or that can "copy" an existing column

For both these reasons, my preference is to use the API we last discussed here: #61775 (comment)

flash1293 · 2020-10-12T21:03:16Z

Thanks for pointing me to the derivative API discussion @wylieconlon - I will implement that, it looks like it nicely covers all cases.

flash1293 · 2020-10-13T13:53:20Z

Adjusted the behavior:

Separate output column is required
Some error handling
Copies over meta data (no option to overwrite formatter information)

wylieconlon

I found the tests here very difficult to verify, so I submitted a PR to try to simplify them. Also did some performance testing using the expression debug mode using Canvas, and a dataset of 10k rows in Canvas. Baseline results with 10k rows was that this function executed in 16ms without grouping, and 19ms with a single grouping. Compared to mapColumn with a fixed string value on the same 10k rows, which executes in 3800ms, this is a huge improvement.

wylieconlon · 2020-10-14T19:19:53Z

src/plugins/expressions/common/expression_functions/specs/tests/cumulative_sum.test.ts

+        },
+        { inputColumnId: 'val', outputColumnId: 'output' }
+      )
+    ).toMatchInlineSnapshot(`


I found these snapshot-based tests very difficult to glance through and to match up the input vs output expected values. In particular, the snapshots are asserting some extra information that I don't think we require tests for, like testing that the previous rows and columns are kept.

What do you think about changing the test style to the shortest-possible code?

expect(result.columns).toContainEqual({ id: 'output', name: 'output', meta: { type: 'number' }, }); expect(result.rows.map((row) => row.output)).toEqual([5, 12, 15, 17]);

++ I was thinking the same thing, would be a bit easier to read if we maybe had just one snapshot test, but asserted the specific values on the others.

That’s a great idea @wylieconlon , feel free to push directly to this PR, if it’s not blocking any other efforts I can also clean it up on Friday.

src/plugins/expressions/common/expression_functions/specs/cumulative_sum.ts

wylieconlon · 2020-10-14T20:05:41Z

src/plugins/expressions/common/expression_functions/specs/tests/cumulative_sum.test.ts

+import { ExecutionContext } from '../../../execution/types';
+import { Datatable } from '../../../expression_types/specs/datatable';
+
+describe('interpreter/functions#cumulative_sum', () => {


I found a missing test case where the initial values are null, by reading through the elasticsearch test cases

It should be covered by https://github.com/elastic/kibana/pull/80129/files/e67af0bf5a21b5b36b37ad83af38edccb032c4a0#diff-228141ae2c11c0c6ec6d2f2b4008e3143fa53429bf6bbf609ccbb5b0904f8894R648

lukeelmers · 2020-10-14T20:46:23Z

src/plugins/expressions/common/expression_functions/specs/tests/cumulative_sum.test.ts

+        },
+        { inputColumnId: 'val', outputColumnId: 'output' }
+      )
+    ).toMatchInlineSnapshot(`


++ I was thinking the same thing, would be a bit easier to read if we maybe had just one snapshot test, but asserted the specific values on the others.

src/plugins/expressions/common/expression_functions/specs/cumulative_sum.ts

flash1293 · 2020-10-16T12:44:38Z

@elasticmachine merge upstream

flash1293 · 2020-10-16T16:41:28Z

Thanks for the review and for the test improvements! I cleaned up the rest of the comments.
@lukeelmers Could you verify the exporting etc is done right?

flash1293 · 2020-10-16T18:38:54Z

@elasticmachine merge upstream

lukeelmers

Code updates LGTM, pending a green build

flash1293 · 2020-10-19T07:46:28Z

@elasticmachine merge upstream

kibanamachine · 2020-10-19T09:24:23Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: f9c58b1

Metrics [docs]

@kbn/optimizer bundle module count

id	before	after	diff
`expressions`	104	105	+1

distributable file count

id	before	after	diff
`default`	48035	48036	+1
`oss`	28563	28564	+1

page load bundle size

id	before	after	diff
`expressions`	196.4KB	201.7KB	+5.4KB

History

💚 Build #82257 succeeded 6e22969
💔 Build #82227 failed 9dca454
💔 Build #82141 failed 583d42c
💔 Build #81750 failed f48880d
💚 Build #81323 succeeded e67af0b

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

flash1293 · 2020-10-19T09:27:02Z

@wylieconlon Do you want to have another look at this?

wylieconlon

LGTM!

…lout-for-warm-and-cold-tier * 'master' of github.com:elastic/kibana: (126 commits) Add cumulative sum expression function (elastic#80129) [APM] Fix link to trace (elastic#80993) Provide url rewritten in onPreRouting interceptor (elastic#80810) limit renovate to npm packages Fix bug in logs UI link (elastic#80943) [Monitoring] Fix bug with setup mode appearing on pages it shouldn't (elastic#80343) [Security Solution][Detection Engine] Fixes false positives caused by empty records in threat list docs test (elastic#81080) Fixed alerts ui test timeout issue, related to the multiple server calls for delete all alerts, by reducing the number of alerts to the two and increasing retry timeout. (elastic#81067) [APM] Fix service map highlighted edge on node select (elastic#80791) Fix typo in toast, slight copy adjustment. (elastic#80843) [Security Solution] reduce optimizer limits (elastic#80997) [maps] 7.10 documentation updates (elastic#79917) [Workplace Search] Fix Group Prioritization route and clean up design (elastic#80903) [Enterprise Search] Added reusable HiddenText component to Credentials (elastic#80033) Upgrade EUI to v29.5.0 (elastic#80753) [Maps] Fix layer-flash when changing style (elastic#80948) [Security Solution] [Detections] Disable edit button when user does not have actions privileges w/ rule + actions (elastic#80220) [Enterprise Search] Handle loading state on Credentials page (elastic#80035) [Monitoring] Fix cluster listing page in how it handles global state (elastic#78979) ...

add cumulative sum expression function

17d1e0b

flash1293 added enhancement New value added to drive a business result v7.11.0 v8.0.0 labels Oct 12, 2020

add required flags

93dc455

flash1293 marked this pull request as ready for review October 12, 2020 12:52

flash1293 requested a review from a team as a code owner October 12, 2020 12:52

flash1293 added Team:AppArch Team:Visualizations Visualization editors, elastic-charts and infrastructure labels Oct 12, 2020

flash1293 requested review from wylieconlon, mbondyra and dej611 October 12, 2020 12:52

flash1293 added release_note:enhancement and removed enhancement New value added to drive a business result labels Oct 12, 2020

ppisljar reviewed Oct 12, 2020

View reviewed changes

botelastic bot added the Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) label Oct 12, 2020

wylieconlon reviewed Oct 12, 2020

View reviewed changes

flash1293 added 2 commits October 13, 2020 15:18

Merge remote-tracking branch 'upstream/master' into lens/cumulative-sum

5d29f58

review comments

950e642

fix non number handling

e67af0b

flash1293 requested review from wylieconlon, ppisljar and lukeelmers October 14, 2020 08:47

Simplify tests

f48880d

wylieconlon reviewed Oct 14, 2020

View reviewed changes

lukeelmers reviewed Oct 14, 2020

View reviewed changes

kibanamachine and others added 3 commits October 16, 2020 08:44

Merge branch 'master' into lens/cumulative-sum

583d42c

Merge remote-tracking branch 'upstream/master' into lens/cumulative-sum

61f6d47

review comments

9dca454

Merge branch 'master' into lens/cumulative-sum

6e22969

lukeelmers approved these changes Oct 16, 2020

View reviewed changes

Merge branch 'master' into lens/cumulative-sum

f9c58b1

wylieconlon approved these changes Oct 19, 2020

View reviewed changes

flash1293 merged commit e1bd1e8 into elastic:master Oct 20, 2020

flash1293 added a commit to flash1293/kibana that referenced this pull request Oct 20, 2020

Add cumulative sum expression function (elastic#80129)

03cee3b

flash1293 mentioned this pull request Oct 20, 2020

[7.x] Add cumulative sum expression function (#80129) #81116

Merged

flash1293 added a commit that referenced this pull request Oct 20, 2020

Add cumulative sum expression function (#80129) (#81116)

c28f106

This was referenced Oct 20, 2020

[Lens] Add cumulative sum aggregation #61776

Closed

Add derivative function #81178

Merged

mbondyra mentioned this pull request Oct 30, 2020

Add moving average function #82122

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cumulative sum expression function #80129

Add cumulative sum expression function #80129

flash1293 commented Oct 12, 2020 •

edited

Loading

elasticmachine commented Oct 12, 2020

elasticmachine commented Oct 12, 2020

ppisljar Oct 12, 2020

ppisljar Oct 12, 2020

flash1293 Oct 12, 2020

flash1293 Oct 12, 2020 •

edited

Loading

wylieconlon Oct 12, 2020

lukeelmers Oct 12, 2020

ppisljar Oct 13, 2020

lukeelmers Oct 13, 2020

flash1293 Oct 13, 2020

lukeelmers Oct 13, 2020

wylieconlon left a comment

wylieconlon Oct 12, 2020

flash1293 commented Oct 12, 2020

flash1293 commented Oct 13, 2020

wylieconlon left a comment •

edited

Loading

wylieconlon Oct 14, 2020 •

edited

Loading

lukeelmers Oct 14, 2020

flash1293 Oct 14, 2020

wylieconlon Oct 14, 2020

flash1293 Oct 16, 2020

lukeelmers Oct 14, 2020

flash1293 commented Oct 16, 2020

flash1293 commented Oct 16, 2020

flash1293 commented Oct 16, 2020

lukeelmers left a comment

flash1293 commented Oct 19, 2020

kibanamachine commented Oct 19, 2020

flash1293 commented Oct 19, 2020

wylieconlon left a comment

Add cumulative sum expression function #80129

Add cumulative sum expression function #80129

Conversation

flash1293 commented Oct 12, 2020 • edited Loading

Summary

Checklist

elasticmachine commented Oct 12, 2020

elasticmachine commented Oct 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flash1293 Oct 12, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wylieconlon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flash1293 commented Oct 12, 2020

flash1293 commented Oct 13, 2020

wylieconlon left a comment • edited Loading

Choose a reason for hiding this comment

wylieconlon Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flash1293 commented Oct 16, 2020

flash1293 commented Oct 16, 2020

flash1293 commented Oct 16, 2020

lukeelmers left a comment

Choose a reason for hiding this comment

flash1293 commented Oct 19, 2020

kibanamachine commented Oct 19, 2020

💚 Build Succeeded

Metrics [docs]

@kbn/optimizer bundle module count

distributable file count

page load bundle size

History

flash1293 commented Oct 19, 2020

wylieconlon left a comment

Choose a reason for hiding this comment

flash1293 commented Oct 12, 2020 •

edited

Loading

flash1293 Oct 12, 2020 •

edited

Loading

wylieconlon left a comment •

edited

Loading

wylieconlon Oct 14, 2020 •

edited

Loading