Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: correct docstring for ak.metadata_from_parquet #2050

Merged
merged 2 commits into from
Dec 31, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions src/awkward/operations/ak_metadata_from_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def metadata_from_parquet(
scan_files (bool): TODO

This function differs from ak.from_parquet._metadata as follows:

* this function will always use a _metadata file, if present
* if there is no _metadata, the schema comes from _common_metadata or
the first data file
Expand All @@ -46,9 +47,10 @@ def metadata_from_parquet(
(use `.type` to get a high-level type),
* `fs`: the fsspec filesystem object,
* `paths`: a list of matching path names,
* `metadata`: the Parquet metadata, which includes `.num_rows` for the length
of the array that would be read by #ak.from_parquet and `.num_row_groups`
for the units that can be filtered (for the #ak.from_parquet `row_groups`
* `col_counts`: the number of rows in each row group,
* `columns`: the columns defined by the schema,
* `num_rows`: the length of the array that would be read by #ak.from_parquet ,
* `num_row_groups`: the units that can be filtered (for the #ak.from_parquet `row_groups`
argument).

See also #ak.from_parquet, #ak.to_parquet.
Expand Down Expand Up @@ -81,14 +83,14 @@ def _impl(

out = {
"form": subform,
"fs": fs,
"paths": actual_paths,
"col_counts": col_counts,
"columns": parquet_columns,
}
if col_counts:
out["num_row_groups"] = len(col_counts)
out["col_counts"] = col_counts
out["num_rows"] = sum(col_counts)
out["num_row_groups"] = len(col_counts)
else:
out["num_rows"] = None
out["num_row_groups"] = None
Expand Down