Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update test for dplyr 1.0.8 #274

Closed
wants to merge 2 commits into from

Conversation

romainfrancois
Copy link

We're about to release dplyr 1.0.8 (tidyverse/dplyr#6080) which would cause this failure in one test:

> checking installed package size ... NOTE
    installed size is 18.3Mb
    sub-directories of 1Mb or more:
      libs  16.5Mb

── Test failures ────────────────────────────────────────────────────────────────────────────── testthat ────

> library(testthat)
> library(bigsnpr)
Loading required package: bigstatsr
> 
> for (k in 1:9)
+   test_check("bigsnpr", filter = paste0(k, '-'))
══ Skipped tests ═══════════════════════════════════════════════════════════════
• is_cran is TRUE (1)

══ Failed tests ════════════════════════════════════════════════════════════════
── Failure (test-1-readBGEN.R:66:3): same variant infos as with QCTOOL ─────────
dplyr::mutate(test$map[-19, 1:6], chromosome = as.integer(chromosome)) not identical to dplyr::as_tibble(...).
Names: 3 string mismatches
Component 1: Modes: numeric, character
Component 1: target is numeric, current is character
Component 2: Modes: character, numeric
Component 2: target is character, current is numeric
Component 3: 198 string mismatches

[ FAIL 1 | WARN 0 | SKIP 1 | PASS 226 ]
Error: Test failures
Execution halted

This is because of this fix: tidyverse/dplyr#6035

@romainfrancois
Copy link
Author

This pull request fixes the test.

@privefl
Copy link
Owner

privefl commented Nov 15, 2021

Hi Romain.
Could you quickly explain what is the problem?
It seems it is not fixed yet. Maybe I can try to fix it myself.

@romainfrancois
Copy link
Author

The test fails because dplyr has fixed this issue: tidyverse/dplyr#5967

Comment on lines 67 to 70
dplyr::relocate(dplyr::mutate(test$map[-19, 1:6], chromosome = as.integer(chromosome)), rsid, .before = "chromosome"),
dplyr::as_tibble(dplyr::transmute(
snp_info[-19, ], chromosome, marker.ID = alternate_ids, rsid,
physical.pos = position, allele1 = alleleA, allele2 = alleleB))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better fix might be to replace the transmute() in the next line with a rename() followed by a select() that selects the columns in the order seen here

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry I still don't get what the error is here..
chromosome is the first variable, and should remain the first variable after the mutate(), which has always been the case, and it seems it is designed to remain like this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take transmute(df, chromosome, marker.ID = alternate_ids, rsid) as the example, which is identical to transmute(df, chromosome = chromosome, marker.ID = alternate_ids, rsid = rsid).

By accident, that was previously putting the columns in the order they are specified, ignoring whether or not any of them already existed, so you'd get a column order of (chromosome, marker.ID, rsid).

This is inconsistent with how mutate() works, where the invariants are:

  • Modified columns that already exist are overwritten in place (their order can't change)
  • Completely new columns are added to the end

So the correct resulting order should have been (chromosome, rsid, marker.ID) (assuming that in the original data frame, chromosome was before rsid), and that is now the case.

Copy link
Owner

@privefl privefl Nov 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow.. Have you asked the community about this?
In transmute(), where you basically only keep the columns you are interested in (and forget about all the others), you kind of forget about the order as well and just want the order that you specify (at least this is what I want).
Please keep it that way. This would break so much code..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, there's no need for transmute() and it can be replaced by select() which can rename as well. , I've updated the pull request.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still plan to change the behaviour of transmute() in the next release of {dplyr}?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@privefl Hi, as explained in tidyverse/dplyr#6086, the behavior seems to have been modified to the original behavior (the behavior of the current CRAN release) again in tidyverse/dplyr#6087 to make it consistent with the past behavior of transmute().

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Then I can close this.

@privefl
Copy link
Owner

privefl commented Nov 15, 2021

I am trying to reproduce the issue with the GitHub version of {dplyr}.
However, when I try to install it with remotes::install_github("tidyverse/dplyr"),
I get the following error: could not find function "on_load".

@romainfrancois
Copy link
Author

@privefl this is probably because you need to update rlang as well, e.g. pak::pak("r-lib/rlang")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants