Mitigate the impact of outliers on the fee rate statistics #394

Keith-CY · 2023-08-02T02:17:00Z

Some high-fee-rate samples appear at 30~50s which makes the average fee rate of slow stage higher than that of high stage.

A filter may be adopted on the data as follows:

if a sample of a lower speed stage has a higher fee rate than a sample of a higher speed stage, it should be wiped out

Any thoughts from @Danie0918 @Sven-TBD

The text was updated successfully, but these errors were encountered:

Danie0918 · 2023-08-02T06:26:11Z

How are these anomalous data generated? If there is no reference value you can use data denoising to find out the anomalous data to clean up.

Keith-CY · 2023-08-02T06:32:31Z

How are these anomalous data generated? If there is no reference value you can use data denoising to find out the anomalous data to clean up.

They are collected from real-world data and I thing they are reasonable because there aren't many transactions in the pool, so the order is not impacted by fee rate obviously

Danie0918 · 2023-08-02T06:40:41Z

Perhaps we can avoid this by sorting the fee and calculating the average elapsed time. For example, take the top 50% of the highest fee and calculate the fastest time, take the bottom 50% and calculate the slowest time, and take the average of all the fees and calculate the average time.

Keith-CY · 2023-08-03T03:44:43Z

Perhaps we can avoid this by sorting the fee and calculating the average elapsed time. For example, take the top 50% of the highest fee and calculate the fastest time, take the bottom 50% and calculate the slowest time, and take the average of all the fees and calculate the average time.

But it cannot pick out outliers. In this case,

The highest fee rates appear around 30~50s, so its average time is about 40s
While the lowers fee rates appear around 10~30s, so its average time is about 20s

The highest fee rate is still mapped to longer elapsed time.

Danie0918 · 2023-08-03T06:13:18Z

Perhaps we can avoid this by sorting the fee and calculating the average elapsed time. For example, take the top 50% of the highest fee and calculate the fastest time, take the bottom 50% and calculate the slowest time, and take the average of all the fees and calculate the average time.

But it cannot pick out outliers. In this case

The highest fee rates appear around 30~~50s, so its average time is about 40s While the lowers fee rates appear around 10~~30s, so its average time is about 20s

The highest fee rate is still mapped to longer elapsed time.

Typically, a higher fee means a shorter period of time, although data from different times cannot be referenced together due to the existence of peaks and valleys in trading. Perhaps we could add a time limit of nearly 10,000 transactions, such as within the last hour, so that the data would be closer to the current situation.

Danie0918 · 2023-08-11T11:52:38Z

As the data is real but inaccurate due to the time span. In this case, we simply filter the data with large deviations, which does not provide correct feedback on the actual situation. So here, we can consider adding a limit to the reference value, only get the transaction data within 1000 blocks, but in order to avoid the gap caused by no data, we need to set a default value of 2000 shannons/KB within 10 seconds.

By the way, the restriction here is only for the rate-tracker reference value chart, other data charts are not adjusted.This is because the other charts are statistics, while this one is a summary chart that can be used for reference.

Keith-CY · 2023-08-14T05:39:48Z

As the data is real but inaccurate due to the time span. In this case, we simply filter the data with large deviations, which does not provide correct feedback on the actual situation. So here, we can consider adding a limit to the reference value, only get the transaction data within 1000 blocks, but in order to avoid the gap caused by no data, we need to set a default value of 2000 shannons/KB within 10 seconds.

By the way, the restriction here is only for the rate-tracker reference value chart, other data charts are not adjusted.This is because the other charts are statistics, while this one is a summary chart that can be used for reference.

LGTM, but IMO the count of samples should be relevant to the count of transactions instead of blocks. What if limiting the count of samples to 10 * TPS, so it becomes dynamic based on the on-chain activities.

Danie0918 · 2023-08-14T08:47:23Z

LGTM, but IMO the count of samples should be relevant to the count of transactions instead of blocks. What if limiting the count of samples to 10 * TPS, so it becomes dynamic based on the on-chain activities.

This is also workable, the blocks are mainly designed to limit the timeframe and prevent long time spans in case of low active transactions.

Keith-CY · 2023-09-13T09:26:45Z

LGTM, but IMO the count of samples should be relevant to the count of transactions instead of blocks. What if limiting the count of samples to 10 * TPS, so it becomes dynamic based on the on-chain activities.

This is also workable, the blocks are mainly designed to limit the timeframe and prevent long time spans in case of low active transactions.

When will we handle this, it's a bit bothering to users

Keith-CY · 2023-09-13T09:28:58Z

BTW, the solution mentioned above just shrinks the time-frame of exceptional status. The algorithm is still required to be optimized to remove outliers from samples.

Danie0918 · 2023-09-13T11:23:49Z

BTW, the solution mentioned above just shrinks the time-frame of exceptional status. The algorithm is still required to be optimized to remove outliers from samples.

According to the last data, the anomalies will be obviously large (more than 10 times beyond the normal value), we can add a suitable value to filter, for example, filtering more than 100000shannons/kB data.

So we can start with the following program：

1.Filtering more than 100000shannons/kB data.
2.Limiting the count of samples to 10 * TPS.

Keith-CY · 2023-09-14T07:27:07Z

BTW, the solution mentioned above just shrinks the time-frame of exceptional status. The algorithm is still required to be optimized to remove outliers from samples.

According to the last data, the anomalies will be obviously large (more than 10 times beyond the normal value), we can add a suitable value to filter, for example, filtering more than 100000shannons/kB data.

So we can start with the following program：

1.Filtering more than 100000shannons/kB data. 2.Limiting the count of samples to 10 * TPS.

The rule

1.Filtering more than 100000shannons/kB data.

It is a hardcoded value that will be imprecise in most cases. For example, when there's a jam on chain, the fee rate should all be greater than 100000 shannons/kB, and all samples will be filtered out.

I will suggest filtering outliers by the following rule

Order samples by confirmation time, from short to long
Check every two adjacent samples, if the long one uses lower fee rate, it should be removed from the sample

By doing so, the filtered samples will be monotonically increasing.

Keith-CY · 2023-09-14T07:45:58Z

For the rule 2

Limiting the count of samples to 10 * TPS.

In case that TPS is very low, which means band-width on-chain is enough for most transactions, we expend the samples with dummy 1000shannons/kB to, say 1000 samples, in total.

By doing so, the fee rate will be close to 1000shannons/kB when there aren't many transactions.

I would suggest using 100 as the threshold based on the current activities

Keith-CY · 2023-09-14T09:29:14Z

A temporary solution(nervosnetwork/ckb-explorer-frontend@672233c) was submitted as a hotfix to avoid exceptional samples in production environment

Danie0918 · 2023-09-14T10:27:18Z

1.Filtering more than 100000shannons/kB data.

This is a transitional program and values can be adjusted as appropriate.

I will suggest filtering outliers by the following rule

Order samples by confirmation time, from short to long
Check every two adjacent samples, if the long one uses lower fee rate, it should be removed from the sample
By doing so, the filtered samples will be monotonically increasing.

Removing long and high fees based on confirmation time sorting may remove normal data if there is a change in the busyness of transactions on the chain when sampling data in a uniform interval.

Returning to the question at hand, I think removing data noise is an appropriate solution.

Mean value denoising : Noise is eliminated by calculating the average of the data over time and comparing it to each data point.
Convolution denoising : Noise is eliminated by smoothing the data using a convolutional kernel.

I would suggest using 100 as the threshold based on the current activities

Regarding the sampling of data, 100 is set as the threshold, but a time limit such as within 1 hour needs to be added. Avoiding too large a time span of data does not reflect the current situation in a timely manner.

Keith-CY · 2023-09-14T15:07:11Z

Removing long and high fees based on confirmation time sorting may remove normal data if there is a change in the busyness of transactions on the chain when sampling data in a uniform interval.

This filter adopts the logic that miners use. In general, tx with a high fee rate will be mined first, that means tx confirmed later won't be with a higher fee rate, theoretically.

Mean value denoising : Noise is eliminated by calculating the average of the data over time and comparing it to each data point.

Removing data above or beneath specific values won't ameliorate the outliers because outliers may all sit in the valid range.

Say the original samples are as follow

By remove some above or beneath specific values, it becomes

It still fluctuates, and the fee rate of longer confirmation time is still possible to be high

Convolution denoising : Noise is eliminated by smoothing the data using a convolutional kernel.

Same as above, if the algorithm is only to smooth the original curve, the trending won't be fixed.

Regarding the sampling of data, 100 is set as the threshold, but a time limit such as within 1 hour needs to be added. Avoiding too large a time span of data does not reflect the current situation in a timely manner.

The suggestion at #394 (comment) covers 2 aspects

minimal count of samples
insert dummy samples if real-world samples are not enough

If the TPS is very low, say 2 transactions/minute, similar to no transactions within 1 hour, many dummy samples with 1000shannons/kB will be inserted to make the trending close to the minimal fee rate.

Keith-CY · 2023-12-29T09:26:49Z

Interquartile range was added by nervosnetwork/ckb-explorer-frontend#1411

Keith-CY · 2024-04-28T06:18:31Z

The lowest fee rate should be recommended when the on-chain bandwidth is not fully occupied. So I would suggest adding a new strategy as follows

set a threshold of acceptable duration;
get the average durations of several low fee rates;
if the average duration <= threshold, set the low fee rate as the recommended fee rate.

Keith-CY · 2024-04-29T03:26:08Z

The lowest fee rate should be recommended when the on-chain bandwidth is not fully occupied. So I would suggest adding a new strategy as follows

set a threshold of acceptable duration;

get the average durations of several low fee rates;

if the average duration <= threshold, set the low fee rate as the recommended fee rate.

Threshold will be set 60s

Keith-CY · 2024-04-29T04:27:21Z

The lowest fee rate should be recommended when the on-chain bandwidth is not fully occupied. So I would suggest adding a new strategy as follows

set a threshold of acceptable duration;

get the average durations of several low fee rates;

if the average duration <= threshold, set the low fee rate as the recommended fee rate.

Threshold will be set 60s

Will be update by nervosnetwork/ckb-explorer-frontend@a92790d

The strategy is slight tweaked that threshold is dynamically updated by average block time, it's set 2 * avg block time

Keith-CY · 2024-04-29T04:28:37Z

The lowest fee rate should be recommended when the on-chain bandwidth is not fully occupied. So I would suggest adding a new strategy as follows

set a threshold of acceptable duration;

get the average durations of several low fee rates;

if the average duration <= threshold, set the low fee rate as the recommended fee rate.

Threshold will be set 60s

That means, the low fee rates are ideal if they make transactions be committed within 2 blocks.

Keith-CY · 2024-04-30T08:11:06Z

Already on mainnet and testnet. It's hard to test unless we send numerous transactions on testnet

FrederLu · 2024-05-20T01:54:03Z

Already on mainnet and testnet. It's hard to test unless we send numerous transactions on testnet

A large amount of data can be built on Testnet in a short time, with 1501 pieces of data per block, but no change in the feerate has been found yet.

Keith-CY · 2024-05-20T02:09:17Z

Already on mainnet and testnet. It's hard to test unless we send numerous transactions on testnet

A large amount of data can be built on Testnet in a short time, with 1501 pieces of data per block, but no change in the feerate has been found yet.

Impacted by this issue #665

There are always hundreds of transactions having a 1000 fee rate in the history even though all recent transactions come with a high fee rate.

Keith-CY added the documentation Improvements or additions to documentation label Aug 2, 2023

Keith-CY assigned Sven-TBD and Danie0918 Aug 2, 2023

github-actions bot mentioned this issue Jan 25, 2024

Update Report Snapshot WhiteMinds/neuron-troubleshooting#8

Merged

github-actions bot mentioned this issue May 3, 2024

Update Report Snapshot Magickbase/websites#80

Merged

github-actions bot mentioned this issue May 10, 2024

Update Report Snapshot Magickbase/websites#90

Merged

github-actions bot mentioned this issue Jun 7, 2024

Update Report Snapshot Magickbase/websites#98

Merged

Keith-CY closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate the impact of outliers on the fee rate statistics #394

Mitigate the impact of outliers on the fee rate statistics #394

Keith-CY commented Aug 2, 2023

Danie0918 commented Aug 2, 2023

Keith-CY commented Aug 2, 2023

Danie0918 commented Aug 2, 2023

Keith-CY commented Aug 3, 2023 •

edited

Loading

Danie0918 commented Aug 3, 2023 •

edited

Loading

Danie0918 commented Aug 11, 2023

Keith-CY commented Aug 14, 2023

Danie0918 commented Aug 14, 2023

Keith-CY commented Sep 13, 2023

Keith-CY commented Sep 13, 2023

Danie0918 commented Sep 13, 2023

Keith-CY commented Sep 14, 2023 •

edited

Loading

Keith-CY commented Sep 14, 2023 •

edited

Loading

Keith-CY commented Sep 14, 2023

Danie0918 commented Sep 14, 2023

Keith-CY commented Sep 14, 2023 •

edited

Loading

Keith-CY commented Dec 29, 2023

Keith-CY commented Apr 28, 2024

Keith-CY commented Apr 29, 2024

Keith-CY commented Apr 29, 2024

Keith-CY commented Apr 29, 2024

Keith-CY commented Apr 30, 2024

FrederLu commented May 20, 2024

Keith-CY commented May 20, 2024 •

edited

Loading

Mitigate the impact of outliers on the fee rate statistics #394

Mitigate the impact of outliers on the fee rate statistics #394

Comments

Keith-CY commented Aug 2, 2023

Danie0918 commented Aug 2, 2023

Keith-CY commented Aug 2, 2023

Danie0918 commented Aug 2, 2023

Keith-CY commented Aug 3, 2023 • edited Loading

Danie0918 commented Aug 3, 2023 • edited Loading

Danie0918 commented Aug 11, 2023

Keith-CY commented Aug 14, 2023

Danie0918 commented Aug 14, 2023

Keith-CY commented Sep 13, 2023

Keith-CY commented Sep 13, 2023

Danie0918 commented Sep 13, 2023

Keith-CY commented Sep 14, 2023 • edited Loading

Keith-CY commented Sep 14, 2023 • edited Loading

Keith-CY commented Sep 14, 2023

Danie0918 commented Sep 14, 2023

Keith-CY commented Sep 14, 2023 • edited Loading

Keith-CY commented Dec 29, 2023

Keith-CY commented Apr 28, 2024

Keith-CY commented Apr 29, 2024

Keith-CY commented Apr 29, 2024

Keith-CY commented Apr 29, 2024

Keith-CY commented Apr 30, 2024

FrederLu commented May 20, 2024

Keith-CY commented May 20, 2024 • edited Loading

Keith-CY commented Aug 3, 2023 •

edited

Loading

Danie0918 commented Aug 3, 2023 •

edited

Loading

Keith-CY commented Sep 14, 2023 •

edited

Loading

Keith-CY commented Sep 14, 2023 •

edited

Loading

Keith-CY commented Sep 14, 2023 •

edited

Loading

Keith-CY commented May 20, 2024 •

edited

Loading