Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polars quantile example #1503

Closed
wants to merge 14 commits into from
Closed

polars quantile example #1503

wants to merge 14 commits into from

Conversation

mccalluc
Copy link
Contributor

@mccalluc mccalluc commented Apr 16, 2024

  • Fix Example of polars quantile workflow #1500
  • Tweaked column names and presentation order for clarity
  • Just playing with the numbers, the behavior of quantile confused me, but it doesn't need to block this.
  • We should also say where scale, alpha, and param are coming from.

@Shoeboxam Shoeboxam force-pushed the 1497-quantile-usability branch 2 times, most recently from 7e5be5f to 597c38c Compare April 17, 2024 02:52
* ``grouping-key``: integers between 1 and 5; the grouping key
* ``twice-key``: integers between 2 and 10
* ``ones``: the float 1.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context for the twice-key and ones columns may be needed to make the example less abstract; possibly including changing the names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "less abstract", do you mean the dataset should actually look like an interesting, real-world dataset? I think examples like that can be useful, but if we're just trying to explain the machinery, constants etc. can make it easier to see what the DP adds.

I'll expand the explanation and maybe we can talk on Thursday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the thought of using a real-world data set, even if it's just PUMS. This could be the basis of the long-awaited cookbook docs page.

@mccalluc
Copy link
Contributor Author

Going to move back to draft to add more context.

@mccalluc mccalluc marked this pull request as draft April 17, 2024 12:15
@mccalluc
Copy link
Contributor Author

Going to take a break from this, and pick it up when alias is fixed upstream.

@Shoeboxam Shoeboxam force-pushed the 1497-quantile-usability branch 4 times, most recently from a05d5b2 to 63a8f25 Compare April 18, 2024 03:37
@mccalluc mccalluc force-pushed the 1500-polars-quantile-examples branch from 2c8c62d to 04e4694 Compare April 18, 2024 12:56
@mccalluc
Copy link
Contributor Author

I'm still a bit confused by the behavior.

some data:

>>> private_lf = pl.LazyFrame([
...     pl.Series("grouping-key", [1, 2, 3, 4, 5] * 10, dtype=pl.Int32),
...     pl.Series("noisy-key", [1] * 10 + [1, 2, 3, 4, 5] * 6 + [5] * 10, dtype=pl.Int32),
...     pl.Series("ones", [1.0] * 50, dtype=pl.Float64),
... ])

a plan:

>>> quantiles_plan = empty_lf.group_by("grouping-key").agg([
...     pl.col("noisy-key")
...         .dp.mean(bounds=(1, 5), scale=2.)
...         .alias("mean"),
...     pl.col("noisy-key")
...         .dp.median(candidates=[1, 2, 3, 4, 5], scale=1.0)
...         .alias("median"),
...     pl.col("noisy-key")
...         .dp.median(candidates=[1, 3, 5], scale=1.0)
...         .alias("median 1/3/5"),
...     pl.col("noisy-key")
...         .dp.quantile(candidates=[1, 2, 3, 4, 5], alpha=0.1, scale=1.0)
...         .alias("10% quantile"),
...     pl.col("noisy-key")
...         .dp.quantile(candidates=[1, 2, 3, 4, 5], alpha=0.9, scale=1.0)
...         .alias("90% quantile"),
... ])

the result:

>>> print(quantiles_release) # doctest: +ELLIPSIS
shape: (5, 6)
┌──────────────┬──────┬────────┬──────────────┬──────────────┬──────────────┐
│ grouping-key ┆ mean ┆ median ┆ median 1/3/5 ┆ 10% quantile ┆ 90% quantile │
│ ---          ┆ ---  ┆ ---    ┆ ---          ┆ ---          ┆ ---          │
│ i32          ┆ f64  ┆ i64    ┆ i64          ┆ i64          ┆ i64          │
╞══════════════╪══════╪════════╪══════════════╪══════════════╪══════════════╡
│ 1            ┆ ...  ┆ 5      ┆ 5            ┆ 5            ┆ 1            │
│ 2            ┆ ...  ┆ 5      ┆ 5            ┆ 5            ┆ 1            │
│ 3            ┆ ...  ┆ 5      ┆ 5            ┆ 5            ┆ 1            │
│ 4            ┆ ...  ┆ 5      ┆ 5            ┆ 5            ┆ 1            │
│ 5            ┆ ...  ┆ 1      ┆ 1            ┆ 5            ┆ 1            │
└──────────────┴──────┴────────┴──────────────┴──────────────┴──────────────┘
  • Am I misusing median?
  • alpha works opposite from what I was expecting, but my expectations are likely wrong.

@Shoeboxam Shoeboxam force-pushed the 1497-quantile-usability branch 13 times, most recently from 3e438d2 to dea3b4a Compare April 22, 2024 15:32
* ``grouping-key``: integers between 1 and 5; the grouping key
* ``twice-key``: integers between 2 and 10
* ``ones``: the float 1.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the thought of using a real-world data set, even if it's just PUMS. This could be the basis of the long-awaited cookbook docs page.

:language: python
:start-after: init-domain
:end-before: /init-domain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate docs page would be useful for explaining the descriptors on frame domains.

@Shoeboxam Shoeboxam force-pushed the 1497-quantile-usability branch 2 times, most recently from a907f6c to 336b341 Compare April 25, 2024 03:56
@Shoeboxam Shoeboxam force-pushed the 1500-polars-quantile-examples branch from 3ade1d4 to 7972c75 Compare June 13, 2024 23:10
@Shoeboxam Shoeboxam force-pushed the 1500-polars-quantile-examples branch from 7972c75 to e54b865 Compare June 14, 2024 03:26
@mccalluc
Copy link
Contributor Author

Supplanted by Gurman's work. Closing.

@mccalluc mccalluc closed this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Example of polars quantile workflow
3 participants