Skip to content

Commit

Permalink
docs: RDataFrame add multi-threaded example (#2269)
Browse files Browse the repository at this point in the history
* docs: add multi-threaded example

* docs: fix multithreading execution

* Update docs/user-guide/how-to-convert-rdataframe.md

Co-authored-by: Angus Hollands <[email protected]>

* Update docs/user-guide/how-to-convert-rdataframe.md

Co-authored-by: Angus Hollands <[email protected]>

* ci: print tracebacks

* ci: don't cache mutable env

* Revert "ci: don't cache mutable env"

This reverts commit 1288d15.

* ci: fix docs environment

* ci: use correct flags 🤦

---------

Co-authored-by: Angus Hollands <[email protected]>
  • Loading branch information
ianna and agoose77 authored Mar 2, 2023
1 parent f7a779a commit ef3e2a9
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 4 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ jobs:
path: dist

- name: Install awkward and awkward-cpp wheels
run: python -m pip install dist/awkward*.whl
run: python -m pip install dist/awkward*.whl --force-reinstall --no-deps

- name: Generate build files
run: pipx run nox -s prepare -- --docs --headers
Expand All @@ -215,7 +215,7 @@ jobs:
echo "DOCS_VERSION=main" >> $GITHUB_ENV
- name: Generate Python documentation
run: sphinx-build -M html . _build/
run: sphinx-build -M html . _build/ -T
working-directory: docs

- name: Upload docs artefact
Expand Down
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ fsspec
s3fs
h5py
matplotlib
awkward
uproot
uproot3
jax>=0.2.7;python_version>="3.6" and sys_platform != "win32"
Expand Down
37 changes: 35 additions & 2 deletions docs/user-guide/how-to-convert-rdataframe.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ The dictionary key defines a column name in RDataFrame.
df = ak.to_rdataframe({"x": array_x, "y": array_y, "z": array_z})
```

The {func} `ak.to_rdataframe` function presents a generated on demand Awkward Array view as an `RDataFrame` source. There is a small overhead of generating Awkward RDataSource C++ code. This operation does not execute the `RDataFrame` event loop. The array data are not copied.
The {func}`ak.to_rdataframe` function presents a generated-on-demand Awkward Array view as an `RDataFrame` source. There is a small overhead of generating Awkward RDataSource C++ code. This operation does not execute the `RDataFrame` event loop. The array data are not copied.

The column readers are generated based on the run-time type of the views. Here is a description of the `RDataFrame` columns:

Expand All @@ -74,7 +74,7 @@ Awkward Arrays are dynamically typed, so in a C++ context, the type name is hash
From RDataFrame to Awkward
--------------------------

The function for `RDataFrame` → Awkward conversion is {func}`ak.from_rdataframe`. The argument to this function requires a tuple of strings that are the `RDataFrame` column names. This function always returns
The function for `RDataFrame` → Awkward conversion is {func}`ak.from_rdataframe`. The argument to this function accepts a tuple of strings that are the `RDataFrame` column names. By default this function returns

* {class}`ak.Array`

Expand All @@ -91,3 +91,36 @@ array = ak.from_rdataframe(
)
array
```

When `RDataFrame` runs multi-threaded event loops, the entry processing order is not guaranteed:

```{code-cell} ipython3
ROOT.ROOT.EnableImplicitMT()
```

+++

Let's recreate the dataframe, to reflect the new multi-threading mode

```{code-cell} ipython3
df = ak.to_rdataframe({"x": array_x, "y": array_y, "z": array_z})
```

+++

If the `keep_order` parameter set to `True`, the columns will keep order after filtering:

```{code-cell} ipython3
df = df.Filter("y % 2 == 0")
array = ak.from_rdataframe(
df,
columns=(
"x",
"y",
"z",
),
keep_order=True,
)
array
```

0 comments on commit ef3e2a9

Please sign in to comment.