by groups of A, subset dataframe by value of B #129

floswald · 2019-05-27T17:57:35Z

hi,

this is an issue i described over at discourse.

suppose i have this:

using DataFrames, DataFramesMeta

julia> df = DataFrame(A = repeat(1:3,3), B = rand(1:9,9),C = rand(9))
9×3 DataFrame
│ Row │ A     │ B     │ C        │
│     │ Int64 │ Int64 │ Float64  │
├─────┼───────┼───────┼──────────┤
│ 1   │ 1     │ 1     │ 0.676259 │
│ 2   │ 2     │ 5     │ 0.675797 │
│ 3   │ 3     │ 9     │ 0.597738 │
│ 4   │ 1     │ 6     │ 0.593364 │
│ 5   │ 2     │ 2     │ 0.156473 │
│ 6   │ 3     │ 7     │ 0.859696 │
│ 7   │ 1     │ 9     │ 0.951905 │
│ 8   │ 2     │ 6     │ 0.196368 │
│ 9   │ 3     │ 1     │ 0.292369 │

Suppose for each group of A I want to discard the row where B is largest within that group:

julia> @linq df |>
           groupby(:A) |>
               where( :B != maximum(:B) )
GroupedDataFrame with 3 groups based on key: A
First Group (3 rows): A = 1
│ Row │ A     │ B     │ C        │
│     │ Int64 │ Int64 │ Float64  │
├─────┼───────┼───────┼──────────┤
│ 1   │ 1     │ 1     │ 0.676259 │
│ 2   │ 1     │ 6     │ 0.593364 │
│ 3   │ 1     │ 9     │ 0.951905 │
⋮
Last Group (3 rows): A = 3
│ Row │ A     │ B     │ C        │
│     │ Int64 │ Int64 │ Float64  │
├─────┼───────┼───────┼──────────┤
│ 1   │ 3     │ 9     │ 0.597738 │
│ 2   │ 3     │ 7     │ 0.859696 │
│ 3   │ 3     │ 1     │ 0.292369 │

no. I want this (without hard coding 9 and without having to index the grouped dataframe...)

julia> g = @linq df |>
           groupby(:A)

julia> filter(x -> (x[:B] < 9),g[1])
2×3 DataFrame
│ Row │ A     │ B     │ C        │
│     │ Int64 │ Int64 │ Float64  │
├─────┼───────┼───────┼──────────┤
│ 1   │ 1     │ 1     │ 0.676259 │
│ 2   │ 1     │ 6     │ 0.593364 │

how to best achieve that? thanks.

The text was updated successfully, but these errors were encountered:

nalimilan · 2019-05-31T21:18:29Z

As noted on Discourse, where should probably operate over rows rather than groups, since that's more powerful.

pdeffebach · 2021-03-07T19:54:07Z

Closed. This was fixed in #192

pdeffebach closed this as completed Mar 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

by groups of A, subset dataframe by value of B #129

by groups of A, subset dataframe by value of B #129

floswald commented May 27, 2019

nalimilan commented May 31, 2019

pdeffebach commented Mar 7, 2021

by groups of A, subset dataframe by value of B #129

by groups of A, subset dataframe by value of B #129

Comments

floswald commented May 27, 2019

nalimilan commented May 31, 2019

pdeffebach commented Mar 7, 2021