-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide information about the quality of a resampled metric #1021
Comments
In my opinion this is interesting for formulas, e.g. to know how many None's were ignored in the calculation. |
@frequenz-floss/python-sdk-team unless someone steps in and shows a use case for this, I think I will close. |
We have often seen lower data rates from components without warning because of site-specific issues. I have seen this happen many times, including last week. Apps need to be able to identify degraded data quality so that they know to be more conservative in their goals. Without it, they will assume that the latest values have a higher accuracy and will overshoot. |
But if we assume a small sampling period, which is want to aim for (1s), then you know that the data rate is low or the quality of the data is bad because the resampler will start producing |
So one suggestion was to use the |
I think the resampler shouldn't produce |
I think it is, because like you said, it tracks source info already and just has to send out one value at startup, and later, whenever the source info is recalculated. |
Let's see if we are talking about the same. When? If data is not coming, then yes, it should produce If it happens sporadically, we should be able to recover when the data comes with the normal rate.
What do you mean by "adjust to the max data age"? Do you mean it should adjust the
Yeah, but it is done for different reasons. Again, the global resampler is just a way to homogenize the input data assuming the data that comes... comes, and comes at a reasonable rate. If we have no data, the resampler should return So this issue is only about knowing if the data for the last 3 seconds (according to the current defaults we use, resampling period of 1s and |
OK, looking at the code, I have some interesting findings that I forgot about:
max_data_age_in_periods: float = 3.0
"""The maximum age a sample can have to be considered *relevant* for resampling.
Expressed in number of periods, where period is the `resampling_period`
if we are downsampling (resampling period bigger than the input period) or
the *input sampling period* if we are upsampling (input period bigger than
the resampling period).
It must be bigger than 1.0.
Example:
If `resampling_period` is 3 seconds, the input sampling period is
1 and `max_data_age_in_periods` is 2, then data older than 3*2
= 6 seconds will be discarded when creating a new sample and never
passed to the resampling function.
If `resampling_period` is 3 seconds, the input sampling period is
5 and `max_data_age_in_periods` is 2, then data older than 5*2
= 10 seconds will be discarded when creating a new sample and never
passed to the resampling function.
"""
So if some location is sending samples every 5 seconds (consistently and from the start), the resampler should be able to cope with it without issues, data for the last 15 seconds should be used to calculate the current sample. If this didn't happen, maybe we have a bug in the resampler. |
Are you sure that this is done if the input data is not on a fixed sampling period? IIUC it can also be None, which I assumed would be used if we use the raw data as input. |
I didn't get what do you mean by "the input data is not on a fixed sampling period". |
If we resample irregular sample periods, e.g. if it's done on the raw data from the components I am not sure we can rely on that. |
So, if we are downsampling, the data considered for the current window is always a fixed time span ( But also for the downsampling case, if a source is flaky at the beginning, we might consider we are actually upsampling the source, because the data rate is too low. Once it recovers, it should be switched to downsampling. I'm not saying this is what we want, I'm just saying this is what the resampler is doing right now. |
What's needed?
We need a way to inform users about the quality of a resampled metric.
For example, if a sample was calculated only using one very old value, the data quality should be low, while if the data was calculated based on many samples and we had up to date samples, then the quality should be high.
This way actors could make more informed decisions on how to use that data.
Proposed solution
SourceProperties
via the resampling actorSourceProperties
FormulaEngine
s aggregate statistics from the components it uses and expose its own statisticsUse cases
No response
Alternatives and workarounds
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: