Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-equi join showing columns in a not so perfect style #5593

Open
hope-data-science opened this issue Feb 13, 2023 · 2 comments
Open

Non-equi join showing columns in a not so perfect style #5593

hope-data-science opened this issue Feb 13, 2023 · 2 comments
Labels
non-equi joins rolling, overlapping, non-equi joins

Comments

@hope-data-science
Copy link

hope-data-science commented Feb 13, 2023

In the latest version of dplyr, non-equi joins could be realized by:

library(dplyr)

transactions <- tibble(
  company = c("A", "A", "B", "B"),
  year = c(2019, 2020, 2021, 2023),
  revenue = c(50, 4, 10, 12)
)
transactions
#> # A tibble: 4 × 3
#>   company  year revenue
#>   <chr>   <dbl>   <dbl>
#> 1 A        2019      50
#> 2 A        2020       4
#> 3 B        2021      10
#> 4 B        2023      12

companies <- tibble(
  id = c("A", "B", "B"),
  since = c(1973, 2009, 2022),
  name = c("Patagonia", "RStudio", "Posit")
)

companies
#> # A tibble: 3 × 3
#>   id    since name     
#>   <chr> <dbl> <chr>    
#> 1 A      1973 Patagonia
#> 2 B      2009 RStudio  
#> 3 B      2022 Posit


transactions |>
  inner_join(companies, join_by(company == id, year >= since))
#> # A tibble: 5 × 5
#>   company  year revenue since name     
#>   <chr>   <dbl>   <dbl> <dbl> <chr>    
#> 1 A        2019      50  1973 Patagonia
#> 2 A        2020       4  1973 Patagonia
#> 3 B        2021      10  2009 RStudio  
#> 4 B        2023      12  2009 RStudio  
#> 5 B        2023      12  2022 Posit

Using data.table, I yield:

library(data.table)
#> 
#> Attaching package: 'data.table'
#> The following objects are masked from 'package:dplyr':
#> 
#>     between, first, last
setDT(transactions)
setDT(companies)
transactions[companies, on = .(company == id, year >= since)]
#>    company year revenue      name
#> 1:       A 1973      50 Patagonia
#> 2:       A 1973       4 Patagonia
#> 3:       B 2009      10   RStudio
#> 4:       B 2009      12   RStudio
#> 5:       B 2022      12     Posit

The operation is the same, but the results is not desirable as that of dplyr's, and dplyr also supports to keep all the columns using argument keep = TRUE. How can I get something similar in data.table?

Thanks.

@davidbudzynski
Copy link
Contributor

You can do it like this:

library(data.table)

(transactions <- data.table(
    company = c("A", "A", "B", "B"),
    year = c(2019, 2020, 2021, 2023),
    revenue = c(50, 4, 10, 12)
))
#>    company  year revenue
#>     <char> <num>   <num>
#> 1:       A  2019      50
#> 2:       A  2020       4
#> 3:       B  2021      10
#> 4:       B  2023      12

(companies <- data.table(
    id = c("A", "B", "B"),
    since = c(1973, 2009, 2022),
    name = c("Patagonia", "RStudio", "Posit")
))
#>        id since      name
#>    <char> <num>    <char>
#> 1:      A  1973 Patagonia
#> 2:      B  2009   RStudio
#> 3:      B  2022     Posit

transactions[
    companies,
    mget(c(names(transactions), "since")),
    on = .(company == id, year >= since)
]
#>    company  year revenue since
#>     <char> <num>   <num> <num>
#> 1:       A  1973      50  1973
#> 2:       A  1973       4  1973
#> 3:       B  2009      10  2009
#> 4:       B  2009      12  2009
#> 5:       B  2022      12  2022

Created on 2023-02-15 with reprex v2.0.2

I based my answer on this SO post

@tdhock
Copy link
Member

tdhock commented Mar 21, 2023

there is a solution in #3093 which is in progress.

@jangorecki jangorecki added the non-equi joins rolling, overlapping, non-equi joins label Jan 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-equi joins rolling, overlapping, non-equi joins
Projects
None yet
Development

No branches or pull requests

4 participants