Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META]PPL new trendline command #3011

Open
YANG-DB opened this issue Sep 11, 2024 · 10 comments
Open

[META]PPL new trendline command #3011

YANG-DB opened this issue Sep 11, 2024 · 10 comments
Labels
enhancement New feature or request PPL Piped processing language

Comments

@YANG-DB
Copy link
Member

YANG-DB commented Sep 11, 2024

Is your feature request related to a problem?

Adding a new PPL trendline command to support computing a moving averages of fields.

We would like to support two flavours of moving average:

SMA : Simple moving average

  • f[i]: The value of field 'f' in the i-th data-point
  • n: The number of data-points in the moving window (period)
  • t: The current time index

SMA(t) = (1/n) * Σ(f[i]), where i = t-n+1 to t


WMA : Weighted moving average

WMA(t) = Σ(w[i] * f[i]) / Σ(w[i]), where i = t-n+1 to t
Where w[i] is the weight for the i-th data-point.

In a typical WMA, the weights are linearly decreasing from the most recent to the oldest data-point:
w[i] = n - (t - i), where i = t-n+1 to t

The complete forumlation would be:
WMA(t) = Σ((n - (t - i)) * f[i]) / Σ(n - (t - i)), where i = t-n+1 to t


Example

The next command shows a trendline over a 5 month period events by month

source=t | stats count(date_month) | trendline sma(5, count) AS trend | fields  trend

The next command would compute a 5-point simple moving average of the 'cpu_usage' field and store it in a new field called 'smooth_cpu'.

source=t| trendline sma(5,cpu_usage) as smooth_cpu

Multiple trendlines could be calculated in a single command, such as

| trendline sma(10,memory) as mem_trend wma(5,network_traffic) as net_trend.

Support for PPL trendline functionality is required for both:

- OpenSearch based PPL engine

- Spark based PPL engine

Do you have any additional context?
Add any other context or screenshots about the feature request here.

@YANG-DB YANG-DB added enhancement New feature or request untriaged PPL Piped processing language labels Sep 11, 2024
@YANG-DB YANG-DB self-assigned this Sep 11, 2024
@LantaoJin
Copy link
Member

Hi @YANG-DB , could you provide any background about distinguishing between trendline and stats? Based on the examples above, I don't get the key of difference between them.

@YANG-DB
Copy link
Member Author

YANG-DB commented Sep 12, 2024

Hi
'stats' command only supports standard average without the sliding average window, we could add that support to 'stats' command but I think that separating the sliding window average from standard stats will simply the actual usage.
In addition such unique commands for trends exist in other prominent pipeline languages

@LantaoJin
Copy link
Member

LantaoJin commented Sep 13, 2024

My concern is there could be more window function related requests in PPL. I prefer to add a new command to support all of them instead of introducing specific smb and wma functions. My thoughts is adding a fundamental syntax for common propose which similar to streamstats.
The example "compute a 5-point simple moving average of the 'cpu_usage' field and store it in a new field called 'smooth_cpu'." could be written to

| streamstats window=5 current=false global=false avg(cpu_usage) as smooth_cpu

or named trendstats

@YANG-DB YANG-DB removed the untriaged label Sep 16, 2024
@jduo
Copy link

jduo commented Oct 8, 2024

Hello, I've started the trendline PPL implementation.
I've ported the lexer and parser changes over from the Spark PR.

Implementation-wise I think this should be very similar to how average is implemented, except the AggregationState should factor in the window specification. Does this seem like the right way to go?

One difference between this and average is that this is a root level command rather than a sub-command of stats

@jduo
Copy link

jduo commented Oct 10, 2024

Also, noticed that the syntax suggested above differs from the SPL definition of trendline:
https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Trendline

Notably, in SPL, the period argument is sort of embedded into moving average type instead of being within the parentheses of the moving average type

@YANG-DB
Copy link
Member Author

YANG-DB commented Oct 11, 2024

@jduo I would like to rethink about this

streamstats

streamstats is another function we can implement in a more generic manner - but still I see value to the trendline implementation in its own - @penghuo @ykmr1224 @dai-chen what do you think ?

@YANG-DB
Copy link
Member Author

YANG-DB commented Oct 11, 2024

Also, noticed that the syntax suggested above differs from the SPL definition of trendline: https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Trendline

Notably, in SPL, the period argument is sort of embedded into moving average type instead of being within the parentheses of the moving average type

@jduo thanks
I'd like to reduce the complexity for the first iteration - it does resemble the stats command - maybe we can think of extending stats instead ?
let me know what you think and if you have any specific suggestion here ?

@jduo
Copy link

jduo commented Oct 11, 2024

Also, noticed that the syntax suggested above differs from the SPL definition of trendline: https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/Trendline
Notably, in SPL, the period argument is sort of embedded into moving average type instead of being within the parentheses of the moving average type

@jduo thanks I'd like to reduce the complexity for the first iteration - it does resemble the stats command - maybe we can think of extending stats instead ? let me know what you think and if you have any specific suggestion here ?

@YANG-DB
I made some good progress on the window-function like version of trendline yesterday before I saw this comment. I have parsing and logical planning working right. I'm going to continue down this path today to see if I can get execution working. I can put up a PR if you'd like to see or wait until execution works.

@YANG-DB
Copy link
Member Author

YANG-DB commented Oct 11, 2024

@jduo thanks for the update
Please continue and create the PR in draft mode

@YANG-DB YANG-DB removed their assignment Oct 11, 2024
@jduo jduo mentioned this issue Oct 12, 2024
7 tasks
@jduo
Copy link

jduo commented Oct 14, 2024

Should the alias part of the trendline command be mandatory? As a user I would expect its optional and to use the original field name if omitted. In SPL it is optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PPL Piped processing language
Projects
Status: In Progress
Development

No branches or pull requests

3 participants