Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create index on index column for stat data files and use in fetch #103

Open
taldcroft opened this issue Feb 25, 2015 · 0 comments
Open

Create index on index column for stat data files and use in fetch #103

taldcroft opened this issue Feb 25, 2015 · 0 comments

Comments

@taldcroft
Copy link
Member

This code is really slow:

            import tables
            h5 = tables.openFile(os.path.join(*filename))
            table = h5.root.data
            times = (table.col('index') + 0.5) * dt  # <<< READ ENTIRE COLUMN
            row0, row1 = np.searchsorted(times, [tstart, tstop])
            table_rows = table[row0:row1]  # returns np.ndarray (structured array)
            h5.close()
            return (times[row0:row1], table_rows, row0, row1)

Instead create an index on index for each 5min and daily h5 file using h5.root.data.cols.index.createIndex(). This is a one-time operation (but also fix update_archive.py for the path where it creates a stat file fresh).

After this update then change the above to turn things around and compute index_start and index_stop based on tstart and tstop, then get the required rows with readWhere(...). This appears to reduce read times for short queries to less than 1 microsec, vs. 225 microsec now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant