Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pairwise comparisons #352

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Pairwise comparisons #352

wants to merge 10 commits into from

Conversation

malecki
Copy link
Contributor

@malecki malecki commented Jan 3, 2019

No description provided.

RespondentIdeology = c("Very Conservative")
)
)
expected_chisq = structure(5.74120651376478, .Names = "X-squared")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the changes in this file should move to test-cube-pairwise, but I wanted to make the changes more evident in the pr.

R/cube-comparisons.R Outdated Show resolved Hide resolved
R/cube-comparisons.R Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 3, 2019

Codecov Report

Merging #352 into master will decrease coverage by <.01%.
The diff coverage is 95.74%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #352      +/-   ##
==========================================
- Coverage   89.97%   89.96%   -0.01%     
==========================================
  Files         115      116       +1     
  Lines        7240     7277      +37     
==========================================
+ Hits         6514     6547      +33     
- Misses        726      730       +4
Impacted Files Coverage Δ
R/cube-residuals.R 97.5% <ø> (-1.47%) ⬇️
R/cube-comparisons.R 95.74% <95.74%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c933492...78363b5. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 3, 2019

Codecov Report

Merging #352 into master will decrease coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #352      +/-   ##
==========================================
- Coverage   90.47%   90.42%   -0.06%     
==========================================
  Files         120      116       -4     
  Lines        7362     7362              
==========================================
- Hits         6661     6657       -4     
- Misses        701      705       +4
Impacted Files Coverage Δ
R/cube-residuals.R 97.77% <ø> (-1.25%) ⬇️
R/cube-comparisons.R 100% <100%> (ø)
R/dichotomize.R 52.94% <0%> (-27.06%) ⬇️
R/members.R 76% <0%> (-12.89%) ⬇️
R/variables.R 94.11% <0%> (-5.89%) ⬇️
R/folders.R 83.95% <0%> (-5.21%) ⬇️
R/new-dataset.R 80.88% <0%> (-4.49%) ⬇️
R/batches.R 78.43% <0%> (-3.06%) ⬇️
R/dataset-update.R 59.61% <0%> (-2.21%) ⬇️
R/project-folder.R 78.12% <0%> (-1.88%) ⬇️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62e4a33...10a6d89. Read the comment docs.

.Dimnames = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))
referencePvals <- structure(rep(1, 16),
.Dim = c(4L, 4L),
.Dimnames = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could be re-written as
cubify(0, dims = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))
and
cubify(1, dims = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))
respectively (and most or all other structure(...) calls could be as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should they? is cubify preferred for developer readability? As only an occasional contributor I wasn’t clear on what cubify did and structure was clearer to me ”this is the reference object”. Can change if you like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find cubify to be a bit more readable, and it saves arguments — it's not necessary of course but those structures stood out to me and I had to copy/paste them and run them to confirm they were doing what they were doing.

It's not relevant for these two, but for the ones that are not a single value it also would be nice if the line breaks matched the rows breaks to make it easier to read. I found myself having to mentally find the diagonal and shift forward/back from there to see where I was in the matrix.

expect_equal(
compareCols(
gender_x_ideology,
baseline = "Very liberal",
x = "Very Conservative"
),
expected_zScores
expected_chisq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reading this right that we're changing the behavior of compareCols() from returning z-scores for each cell (like zScores() does) to returning the return value from chisq.test? Is there any reason not to include stdres in the allowed value parameters so that it can return what we have been already returning?

My suspicion from talking to a few of the offices about this is that the p-value derived from those z-scores are what people are thinking they want when they compare one column against another (and the same in the pair-wise compare all columns case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we are transitioning away from z-scores to just providing the test statistic we should match that behavior elsewhere. I'm not certain we actually want to make that transition, however.

#'
#' @examples
#' \dontrun{
#' some_cube <- crunch_example_data(cat_by_cat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' some_cube <- crunch_example_data(cat_by_cat)
#' some_cube <- crtabs(~ educ + gender, ds)

#' Use the alternative Wishart method of forming the matrix of column- or row-wise
#' comparison Chi-squared test statistics for a categorical-by-categorical
#' contingency table.
#'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add:

The null hypothesis is that all of the rows (or columns) are equal to each other. The test statistic matrix that is returned when requested is a measure of closeness between the pair of rows (or columns) given by their names. The p-value matrix that is returned are similarly the probabilities of finding the observed or more extreme results while the null hypothesis is true for each pair of rows (or columns).

#'
#' Generate a matrix of pairwise comparisons of rows or columns, each against
#' the others.
#'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add:

The null hypothesis is that for each pair of rows (or columns) those two specific rows (or columns) in the pair are equal to each other. The test statistic matrix that is returned when requested is a measure of closeness between the pair of rows (or columns) given by their names. The p-value matrix that is returned are similarly the probabilities of finding the observed or more extreme results while the null hypothesis is true for each pair of rows (or columns).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants