Create index on index column for stat data files and use in fetch #103

taldcroft · 2015-02-25T20:25:40Z

This code is really slow:

            import tables
            h5 = tables.openFile(os.path.join(*filename))
            table = h5.root.data
            times = (table.col('index') + 0.5) * dt  # <<< READ ENTIRE COLUMN
            row0, row1 = np.searchsorted(times, [tstart, tstop])
            table_rows = table[row0:row1]  # returns np.ndarray (structured array)
            h5.close()
            return (times[row0:row1], table_rows, row0, row1)

Instead create an index on index for each 5min and daily h5 file using h5.root.data.cols.index.createIndex(). This is a one-time operation (but also fix update_archive.py for the path where it creates a stat file fresh).

After this update then change the above to turn things around and compute index_start and index_stop based on tstart and tstop, then get the required rows with readWhere(...). This appears to reduce read times for short queries to less than 1 microsec, vs. 225 microsec now.

The text was updated successfully, but these errors were encountered:

taldcroft added the enhancement label Feb 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create index on index column for stat data files and use in fetch #103

Create index on index column for stat data files and use in fetch #103

taldcroft commented Feb 25, 2015

Create index on index column for stat data files and use in fetch #103

Create index on index column for stat data files and use in fetch #103

Comments

taldcroft commented Feb 25, 2015