-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work with Table Loaded Using Arquero #116
Comments
It's hard to tell if Arquero is storing Arrow internally or not. It looks like they may not be. You should print out the schema of the table you're trying to pass in to deck-layers, but yeah presumably Arquero is messing with the nested geometry column. "Soon" we'll have another release of geoarrow-rust for JavaScript, which will bring Arrow-backed geospatial operations to WebAssembly. Though it won't have general dataframe operations. |
You know, now that I think about it, I think Arquero stores it as an array, not arrow, which seems less than awesome from a memory point of view. My real goal is to be able to plot some subset of the data, say based on drop down select inputs the user can select. I know there is |
Digging deeper, it looks like Arquero stores the columns as arrow columns. So I'm guessing it's not able to do that with the geometry column. |
You should import then export back to Arrow, and print out the change in the table schema from that transformation.
deck.gl v9 was just released, and while the official website hasn't been updated yet, you can see from https://felixpalmer.github.io/deck.gl/docs/whats-new that category filtering was added to the |
OMG, that will be great. You should see the hoops I'm jumping through to filter the Arrow data. const GEOARROW_POINT_DATA = "https://jaredlander.com/data/hood_centers.arrow";
const hoods = await Arrow.tableFromIPC(fetch(GEOARROW_POINT_DATA));
function filter_text(df, column, values){
let result = df.slice(0, 0);
let df_aq = aq.from(df);
let rows_to_keep = df_aq.
params({
vals_to_check: values,
column_to_use: column
}).
filter((d, $) => op.includes($.vals_to_check, d[$.column_to_use]))
.indices()
rows_to_keep.forEach(d => result = result.concat(df.slice(d, d + 1)));
return result;
}
let reduced_data = filter_text(hoods, 'BoroName', ['Manhattan', 'Queens']); And that seems to work, but so fragile. |
Oh that's your problem.
This is how the API is expected to be used. You should've gotten a type error when passing in If you look at any example from the readme, it calls |
Am I right that if you use |
This is irrelevant. That's GeoParquet metadata that is unused here. |
You are correct, it works with |
It would unintentionally work for a GeoArrow table, that is, if a column exists that is tagged with geoarrow metadata. Because we check for a geoarrow column first (which we probably shouldn't) deck.gl-layers/src/scatterplot-layer.ts Lines 109 to 121 in b128f31
If your schema above is correct, it says that there's no metadata on the geometry field, but maybe that's an error in maintaining that metadata when exporting to JSON. Otherwise, it's hard to imagine how that would work. I specifically check against geoarrow types: deck.gl-layers/src/scatterplot-layer.ts Lines 122 to 135 in b128f31
|
Yep, the metadata on the 'geometry' column is different. hoods.schema.fields[3].metadata new Map([
[
"ARROW:extension:metadata",
"\u0001\u0000\u0000\u0000\u0003\u0000\u0000\u0000crs�\u0000\u0000\u0000GEOGCS[\"WGS 84\",DATUM[\"WGS_1984\",SPHEROID[\"WGS 84\",6378137,298.257223563]],PRIMEM[\"Greenwich\",0],UNIT[\"degree\",0.0174532925199433,AUTHORITY[\"EPSG\",\"9122\"]],AXIS[\"Latitude\",NORTH],AXIS[\"Longitude\",EAST],AUTHORITY[\"EPSG\",\"4326\"]]"
],
[
"ARROW:extension:name",
"geoarrow.point"
]
]) Vs hoods_arrow.schema.fields[3].metadata new Map() So basically, it's an accident that |
Yes. I'll "fix" it so that always fails.
The reason this library is fast is because it can copy data directly from Arrow memory to the GPU. Deck.gl's function accessors is indeed a simple and approachable API, but it can have significant overhead and memory use by needing to create new buffers. This means that whenever possible we want entire columns, not functions. E.g. lonboard works solely in terms of buffers and never does any row-based function accessors on the frontend. Even for stuff like colors and radii, we serialize an entire column of values into a single buffer, and then on the frontend copy those directly to the GPU. |
In that case, can we change the default? For a regular ScatterplotLayer, the default is |
The default is to infer the geometry column based on the GeoArrow metadata. So in your initial example you don't need to pass anything because the metadata exists on the column. Any time that you have one geometry column and it's tagged with GeoArrow metadata, you don't need to pass in I'm not going to change that default behavior because that's much more stable than using the name "geometry". |
Makes sense. Thanks! |
Data manipulation capabilities were removed from Arrow a while back and they now suggest using Arquero for operations such as filtering, summarizing, etc.
I would love to be able to manipulate the table before passing it to various layers, for example, I might want to filter out certain rows.
From what I can tell, the layers require that the data be loaded with
Arrow.tableFromIPC()
while, of course, Arquero requires the data be loaded withaq.loadArrow()
.I can convert from the
Arrow
-loaded data to anArquero
-type and useArquero
functions. I can even send that back toArrow
-style, but then it is unusable withgeoarrow/deck.gl-layers
.For instance (again, need to download the datafile somewhere since my website doesn't support CORS).
If we read the data directly into
Arquero
we can filter it and transfer it toArrow
and see we get the correct number of rows. (For some reason we can't convert back toArrow
if we first imported FROMArrow
.)However, this newly created
zones_arrow
cannot be used for mapping, resulting in this error.I'm guessing this is because
Arquero
doesn't know how to properly handle geometry columns. Would be really nice if we can perform some sort of manipulation on the data before plotting, but I'm not quite sure how to accomplish this.The text was updated successfully, but these errors were encountered: