Refactor to allow non-quadtree arrow datasets #27

bmschmidt · 2022-06-14T14:09:43Z

The visualization vocabulary here is better than most other webgl scatterplot libraries, and most datasets don't have more than a few million points anyway.

So there should be a subclass of Tile that accepts a buffer of Arrow IPC as an argument and returns a set of tiles corresponding to the record batches. This would not be a single tile representation because the fastest and most memory-efficient approach would be use the chunks already existing on the arrow dataframe.

For now users would be responsible for serializing an arrow table themselves; I can make an observable notebook showing how to do this from a CSV using arquero, but I don't want to have to shove arquero into the already-large deepscatter library.

bmschmidt · 2022-08-04T14:24:52Z

Turns out the direction this is going is to pull back a little bit from the "Quadtree is all you need" approach I've been using and to let some of the methods that currently live in the root quad tile exist instead in a new abstract class called Dataset that can be extended to run on quadtrees but also arrow tables, duckdb databases, arquero tables, whatever.

This will also formalize the idea that tiles are things that can generate Arrow record batches. The number of methods for each one of these isn't extremely large, although they'll mostly new subclasses of Tile as well (to handle things like descendants.)

export abstract class Dataset<T extends Tile> {
  abstract root_tile : T;
//  public mutations : 
  public max_ix : number = -1;
  protected plot : Scatterplot;
  protected _tileworkers: TileWorker[] = [];
  abstract ready : Promise<void>;
  constructor(plot : Scatterplot) {
    this.plot = plot;

bmschmidt · 2022-08-16T14:02:05Z

This is merged.

bmschmidt mentioned this issue Jul 19, 2022

Inline plots for Jupyter and R #29

Open

bmschmidt changed the title ~~Allow scatterplots straight on arrow memory.~~ Refactor to allow non-quadtree arrow datasets Aug 4, 2022

bmschmidt mentioned this issue Aug 12, 2022

Initial test suite. #39

Open

bmschmidt closed this as completed Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to allow non-quadtree arrow datasets #27

Refactor to allow non-quadtree arrow datasets #27

bmschmidt commented Jun 14, 2022

bmschmidt commented Aug 4, 2022

bmschmidt commented Aug 16, 2022

Refactor to allow non-quadtree arrow datasets #27

Refactor to allow non-quadtree arrow datasets #27

Comments

bmschmidt commented Jun 14, 2022

bmschmidt commented Aug 4, 2022

bmschmidt commented Aug 16, 2022