Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to allow non-quadtree arrow datasets #27

Closed
bmschmidt opened this issue Jun 14, 2022 · 2 comments
Closed

Refactor to allow non-quadtree arrow datasets #27

bmschmidt opened this issue Jun 14, 2022 · 2 comments

Comments

@bmschmidt
Copy link
Collaborator

The visualization vocabulary here is better than most other webgl scatterplot libraries, and most datasets don't have more than a few million points anyway.

So there should be a subclass of Tile that accepts a buffer of Arrow IPC as an argument and returns a set of tiles corresponding to the record batches. This would not be a single tile representation because the fastest and most memory-efficient approach would be use the chunks already existing on the arrow dataframe.

For now users would be responsible for serializing an arrow table themselves; I can make an observable notebook showing how to do this from a CSV using arquero, but I don't want to have to shove arquero into the already-large deepscatter library.

@bmschmidt
Copy link
Collaborator Author

Turns out the direction this is going is to pull back a little bit from the "Quadtree is all you need" approach I've been using and to let some of the methods that currently live in the root quad tile exist instead in a new abstract class called Dataset that can be extended to run on quadtrees but also arrow tables, duckdb databases, arquero tables, whatever.

This will also formalize the idea that tiles are things that can generate Arrow record batches. The number of methods for each one of these isn't extremely large, although they'll mostly new subclasses of Tile as well (to handle things like descendants.)

export abstract class Dataset<T extends Tile> {
  abstract root_tile : T;
//  public mutations : 
  public max_ix : number = -1;
  protected plot : Scatterplot;
  protected _tileworkers: TileWorker[] = [];
  abstract ready : Promise<void>;
  constructor(plot : Scatterplot) {
    this.plot = plot;

@bmschmidt bmschmidt changed the title Allow scatterplots straight on arrow memory. Refactor to allow non-quadtree arrow datasets Aug 4, 2022
@bmschmidt
Copy link
Collaborator Author

This is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant