-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to allow non-quadtree arrow datasets #27
Comments
Turns out the direction this is going is to pull back a little bit from the "Quadtree is all you need" approach I've been using and to let some of the methods that currently live in the root quad tile exist instead in a new abstract class called This will also formalize the idea that tiles are things that can generate Arrow record batches. The number of methods for each one of these isn't extremely large, although they'll mostly new subclasses of export abstract class Dataset<T extends Tile> {
abstract root_tile : T;
// public mutations :
public max_ix : number = -1;
protected plot : Scatterplot;
protected _tileworkers: TileWorker[] = [];
abstract ready : Promise<void>;
constructor(plot : Scatterplot) {
this.plot = plot; |
This is merged. |
The visualization vocabulary here is better than most other webgl scatterplot libraries, and most datasets don't have more than a few million points anyway.
So there should be a subclass of Tile that accepts a buffer of Arrow IPC as an argument and returns a set of tiles corresponding to the record batches. This would not be a single tile representation because the fastest and most memory-efficient approach would be use the chunks already existing on the arrow dataframe.
For now users would be responsible for serializing an arrow table themselves; I can make an observable notebook showing how to do this from a CSV using arquero, but I don't want to have to shove arquero into the already-large deepscatter library.
The text was updated successfully, but these errors were encountered: