Pairwise comparisons #352

malecki · 2019-01-03T16:12:03Z

No description provided.

malecki · 2019-01-03T16:27:43Z

tests/testthat/test-cube-residuals-zed-scores.R

-            RespondentIdeology = c("Very Conservative")
-        )
-    )
+    expected_chisq = structure(5.74120651376478, .Names = "X-squared")


Most of the changes in this file should move to test-cube-pairwise, but I wanted to make the changes more evident in the pr.

R/cube-comparisons.R

codecov · 2019-01-03T19:49:03Z

Codecov Report

Merging #352 into master will decrease coverage by <.01%.
The diff coverage is 95.74%.

@@            Coverage Diff             @@
##           master     #352      +/-   ##
==========================================
- Coverage   89.97%   89.96%   -0.01%     
==========================================
  Files         115      116       +1     
  Lines        7240     7277      +37     
==========================================
+ Hits         6514     6547      +33     
- Misses        726      730       +4

Impacted Files	Coverage Δ
R/cube-residuals.R	`97.5% <ø> (-1.47%)`	⬇️
R/cube-comparisons.R	`95.74% <95.74%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c933492...78363b5. Read the comment docs.

codecov · 2019-01-03T19:49:03Z

Codecov Report

Merging #352 into master will decrease coverage by 0.05%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #352      +/-   ##
==========================================
- Coverage   90.47%   90.42%   -0.06%     
==========================================
  Files         120      116       -4     
  Lines        7362     7362              
==========================================
- Hits         6661     6657       -4     
- Misses        701      705       +4

Impacted Files	Coverage Δ
R/cube-residuals.R	`97.77% <ø> (-1.25%)`	⬇️
R/cube-comparisons.R	`100% <100%> (ø)`
R/dichotomize.R	`52.94% <0%> (-27.06%)`	⬇️
R/members.R	`76% <0%> (-12.89%)`	⬇️
R/variables.R	`94.11% <0%> (-5.89%)`	⬇️
R/folders.R	`83.95% <0%> (-5.21%)`	⬇️
R/new-dataset.R	`80.88% <0%> (-4.49%)`	⬇️
R/batches.R	`78.43% <0%> (-3.06%)`	⬇️
R/dataset-update.R	`59.61% <0%> (-2.21%)`	⬇️
R/project-folder.R	`78.12% <0%> (-1.88%)`	⬇️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62e4a33...10a6d89. Read the comment docs.

jonkeane · 2019-01-07T21:14:33Z

tests/testthat/test-cube-pairwise.R

+                           .Dimnames = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))
+    referencePvals <- structure(rep(1, 16),
+                                .Dim = c(4L, 4L),
+                                .Dimnames = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))


These could be re-written as
cubify(0, dims = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))
and
cubify(1, dims = list(c("a", "b", "c", "d"), c("a", "b", "c", "d")))
respectively (and most or all other structure(...) calls could be as well)

should they? is cubify preferred for developer readability? As only an occasional contributor I wasn’t clear on what cubify did and structure was clearer to me ”this is the reference object”. Can change if you like.

I find cubify to be a bit more readable, and it saves arguments — it's not necessary of course but those structures stood out to me and I had to copy/paste them and run them to confirm they were doing what they were doing.

It's not relevant for these two, but for the ones that are not a single value it also would be nice if the line breaks matched the rows breaks to make it easier to read. I found myself having to mentally find the diagonal and shift forward/back from there to see where I was in the matrix.

jonkeane · 2019-01-07T21:30:41Z

tests/testthat/test-cube-residuals-zed-scores.R

    expect_equal(
        compareCols(
            gender_x_ideology,
            baseline = "Very liberal",
            x = "Very Conservative"
        ),
-        expected_zScores
+        expected_chisq


Am I reading this right that we're changing the behavior of compareCols() from returning z-scores for each cell (like zScores() does) to returning the return value from chisq.test? Is there any reason not to include stdres in the allowed value parameters so that it can return what we have been already returning?

My suspicion from talking to a few of the offices about this is that the p-value derived from those z-scores are what people are thinking they want when they compare one column against another (and the same in the pair-wise compare all columns case).

Also, if we are transitioning away from z-scores to just providing the test statistic we should match that behavior elsewhere. I'm not certain we actually want to make that transition, however.

jonkeane · 2019-01-07T21:33:50Z

R/cube-comparisons.R

+#'
+#' @examples
+#' \dontrun{
+#' some_cube <- crunch_example_data(cat_by_cat)


Suggested change

#' some_cube <- crunch_example_data(cat_by_cat)

#' some_cube <- crtabs(~ educ + gender, ds)

jonkeane · 2019-01-07T21:41:09Z

R/cube-comparisons.R

+#' Use the alternative Wishart method of forming the matrix of column- or row-wise
+#' comparison Chi-squared test statistics for a categorical-by-categorical 
+#' contingency table.
+#'


To add:

The null hypothesis is that all of the rows (or columns) are equal to each other. The test statistic matrix that is returned when requested is a measure of closeness between the pair of rows (or columns) given by their names. The p-value matrix that is returned are similarly the probabilities of finding the observed or more extreme results while the null hypothesis is true for each pair of rows (or columns).

jonkeane · 2019-01-07T21:42:12Z

R/cube-comparisons.R

+#' 
+#' Generate a matrix of pairwise comparisons of rows or columns, each against 
+#' the others.
+#'


To add:

The null hypothesis is that for each pair of rows (or columns) those two specific rows (or columns) in the pair are equal to each other. The test statistic matrix that is returned when requested is a measure of closeness between the pair of rows (or columns) given by their names. The p-value matrix that is returned are similarly the probabilities of finding the observed or more extreme results while the null hypothesis is true for each pair of rows (or columns).

malecki commented Jan 3, 2019

View reviewed changes

nealrichardson reviewed Jan 3, 2019

View reviewed changes

R/cube-comparisons.R Outdated Show resolved Hide resolved

R/cube-comparisons.R Show resolved Hide resolved

malecki force-pushed the pairwise-comparisons branch from 574b1ff to 29b65c6 Compare January 4, 2019 14:31

jonkeane suggested changes Jan 7, 2019

View reviewed changes

malecki force-pushed the pairwise-comparisons branch from 29b65c6 to a4adb4e Compare January 19, 2019 19:07

malecki force-pushed the pairwise-comparisons branch from a4adb4e to ea7eb71 Compare March 13, 2019 11:45

malecki added 10 commits April 6, 2019 19:24

add cubes for hirotsu reference data

a0e4d29

add pairwise stuff

2890d48

update some expectations for pairwise

28f956d

add reference pvals

ec20feb

change dependency

ef338d3

appease warnings

25d3d47

genearte some doc, spell

f4f2d16

check passes locally

e665946

restore coverage

3c34e60

add a cube with no row/col differences

10a6d89

malecki force-pushed the pairwise-comparisons branch from ea7eb71 to 10a6d89 Compare April 7, 2019 02:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pairwise comparisons #352

Pairwise comparisons #352

malecki commented Jan 3, 2019

malecki Jan 3, 2019

codecov bot commented Jan 3, 2019

codecov bot commented Jan 3, 2019 •

edited

Loading

jonkeane Jan 7, 2019

malecki Jan 7, 2019

jonkeane Jan 7, 2019

jonkeane Jan 7, 2019

jonkeane Jan 7, 2019

jonkeane Jan 7, 2019

jonkeane Jan 7, 2019

jonkeane Jan 7, 2019

	#' some_cube <- crunch_example_data(cat_by_cat)
	#' some_cube <- crtabs(~ educ + gender, ds)

Pairwise comparisons #352

Are you sure you want to change the base?

Pairwise comparisons #352

Conversation

malecki commented Jan 3, 2019

Choose a reason for hiding this comment

codecov bot commented Jan 3, 2019

Codecov Report

codecov bot commented Jan 3, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 3, 2019 •

edited

Loading