-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Check] Entire column of Timeseries is NaN #229
base: dev
Are you sure you want to change the base?
Changes from 10 commits
da3908f
50a5b54
2bf4289
54fc7d0
72a4564
60d280c
194f16a
5a99410
23d2e6a
7377c26
6d2c54c
847f0d3
fb842e0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -92,3 +92,38 @@ def check_resolution(time_series: TimeSeries): | |
return InspectorMessage( | ||
message=f"'resolution' should use -1.0 or NaN for unknown instead of {time_series.resolution}." | ||
) | ||
|
||
|
||
@register_check(importance=Importance.BEST_PRACTICE_SUGGESTION, neurodata_type=TimeSeries) | ||
def check_rows_not_nan(time_series: TimeSeries, nelems=200): | ||
"""Check that each row of a TimeSeries has at least one non-NaN piece of data.""" | ||
n_dims = len(time_series.data.shape) | ||
if n_dims > 2: | ||
yield | ||
|
||
subframe_selection = np.unique( | ||
np.round(np.linspace(start=0, stop=time_series.data.shape[0] - 1, num=nelems)).astype(int) | ||
) | ||
if n_dims == 1: | ||
if not all(np.isnan(time_series.data[:nelems])): | ||
yield | ||
|
||
if all(np.isnan(time_series.data[subframe_selection])): | ||
yield InspectorMessage( | ||
message=( | ||
"This TimeSeries appears to contain NaN data at each frame. " | ||
"Consider removing this object from the file." | ||
) | ||
) | ||
elif n_dims == 2: | ||
for col in range(time_series.data.shape[1]): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what if there are 1000s of columns? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 3 SpikeGLX probes would hit that amount... Not that I've seen anyone do more than 2 currently, but we can think towards the future. Could be faster if we did the Same case could be made for the If you're suggesting to subselect over columns as well then here, I'd just point out that could ultimately reduce the accuracy if there are only a small number of Now, that's aside from the other idea I have for pulling data chunk-wise whenever possible, and fully utilizing each chunks data (described below). |
||
if not all(np.isnan(time_series.data[:nelems, col]).flatten()): | ||
continue | ||
|
||
if all(np.isnan(time_series.data[subframe_selection, col]).flatten()): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there a way of modifying this code to minimize the number of h5py dataset reads? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we could take the approach of pulling out and checking data in packets of 'chunks' (would be much more efficient for streaming). In the worst case, as it currently stands, the slicing might span One idea could be to check if the This is also all separate from the first early exit condition of this entire check, which only examines the first I can think more about the total data usage that this function might require and report some approximate numbers. But yeah, we might need to be a bit more restrictive than usual with the |
||
yield InspectorMessage( | ||
message=( | ||
f"Column index {col} of this TimeSeries appears to contain NaN data at each frame. " | ||
"Consider removing this column from the TimeSeries." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you could move this to line 110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the clever logic we devised from the tables PR now (direct slicing with 'by' amount calculated from shape)