Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frequencies: Limit the range of frequency pivots to the observed data when no start and end date are provided #1132

Open
huddlej opened this issue Jan 19, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@huddlej
Copy link
Contributor

huddlej commented Jan 19, 2023

Context

This issue arose as part of a conversation about pivot intervals for frequencies and what reasonable start/end dates should be.

Description

The time points when we evaluate frequencies should not exceed the date range of the observed data unless the user has requested a specific start and/or end date. As @rneher notes in the conversation linked above:

Pivots for KDE frequencies should never extend more than the narrow width beyond the last time window were there is more or less representative data. They would be too heavily influenced by fluctuations in that case. This is less of an issue for the diffusion frequencies.

Instead of setting the lower and upper date bounds based on the pivot frequency like so:

pivot_start = start_date if start_date else np.floor(np.min(observations) / pivot_frequency) * pivot_frequency
pivot_end = end_date if end_date else np.ceil(np.max(observations) / pivot_frequency) * pivot_frequency

we should consider at least using a max date of the latest observation like so:

pivot_start = start_date if start_date else np.floor(np.min(observations) / pivot_frequency) * pivot_frequency
pivot_end = end_date if end_date else np.max(observations)

The latest approach to calculating pivots would guarantee that the last pivot matches the latest data point. However, there is no guarantee that a lower date bound based on the earliest observation would match that observation. We could change the logic of the pivot calculations to ensure that the earliest observation (or start date) is always included, though. This isn't a large change conceptually, but it would require us to update the expected behavior of the frequencies API as represented in our unit tests.

@huddlej huddlej added the enhancement New feature or request label Jan 19, 2023
@huddlej huddlej self-assigned this Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

1 participant