Chi-Square Test in R
This article is just a little note on R.
GOAL
To do chi-square test and residual analysis in R. If you want to know chi-square test and implement it in python, refer “Chi-Square Test in Python”.
Source Code
> test_data <- data.frame( groups = c("A","A", "B", "B", "C", "C"), result = c("success", "failure", "success", "failure", "success", "failure"), number = c(23, 100, 65, 44, 158, 119) ) > test_data groups result number 1 A success 23 2 A failure 100 3 B success 65 4 B failure 44 5 C success 158 6 C failure 119 > cross_data <- xtabs(number ~ ., test_data) > cross_data result groups failure success A 100 23 B 44 65 C 119 158 > result <- chisq.test(cross_data, correct=F) > result Pearson's Chi-squared test data: cross_data X-squared = 57.236, df = 2, p-value = 3.727e-13 > reesult$residuals result groups failure success A 4.571703 -4.727030 B -1.641673 1.697450 C -2.016609 2.085125 > result$stdres result groups failure success A 7.551524 -7.551524 B -2.663833 2.663833 C -4.296630 4.296630 > pnorm(abs(result$stdres), lower.tail = FALSE) * 2 result groups failure success A 4.301958e-14 4.301958e-14 B 7.725587e-03 7.725587e-03 C 1.734143e-05 1.734143e-05
functions
xtabs
xtabs() function is a function to create a contingency table from cross-classifying factors that contained in a data frame. “~” is the formula to specify variables that serve as aggregation criteria are described. And “~ .” means that this function use all variables (groups+result).
chisq.test
chisq.test() function is a function to return the test statistic, degree of freedom and p-value. The argument “correct” is continuity correction and set “correct” into F to suppress the continuity correction.
reesult$residuals
$residuals return standardized residuals.
result$stdres
$stdres return adjusted standardized residuals.
pnorm(abs(result$stdres), lower.tail = FALSE) * 2
This calculates p-value of standardized residuals.