Chi-Square Test in R
This article is just a little note on R.
GOAL
To do chi-square test and residual analysis in R. If you want to know chi-square test and implement it in python, refer “Chi-Square Test in Python”.
Source Code
> test_data <- data.frame(
groups = c("A","A", "B", "B", "C", "C"),
result = c("success", "failure", "success", "failure", "success",
"failure"),
number = c(23, 100, 65, 44, 158, 119)
)
> test_data
groups result number
1 A success 23
2 A failure 100
3 B success 65
4 B failure 44
5 C success 158
6 C failure 119
> cross_data <- xtabs(number ~ ., test_data)
> cross_data
result
groups failure success
A 100 23
B 44 65
C 119 158
> result <- chisq.test(cross_data, correct=F)
> result
Pearson's Chi-squared test
data: cross_data
X-squared = 57.236, df = 2, p-value = 3.727e-13
> reesult$residuals
result
groups failure success
A 4.571703 -4.727030
B -1.641673 1.697450
C -2.016609 2.085125
> result$stdres
result
groups failure success
A 7.551524 -7.551524
B -2.663833 2.663833
C -4.296630 4.296630
> pnorm(abs(result$stdres), lower.tail = FALSE) * 2
result
groups failure success
A 4.301958e-14 4.301958e-14
B 7.725587e-03 7.725587e-03
C 1.734143e-05 1.734143e-05functions
xtabs
xtabs() function is a function to create a contingency table from cross-classifying factors that contained in a data frame. “~” is the formula to specify variables that serve as aggregation criteria are described. And “~ .” means that this function use all variables (groups+result).
chisq.test
chisq.test() function is a function to return the test statistic, degree of freedom and p-value. The argument “correct” is continuity correction and set “correct” into F to suppress the continuity correction.
reesult$residuals
$residuals return standardized residuals.
result$stdres
$stdres return adjusted standardized residuals.
pnorm(abs(result$stdres), lower.tail = FALSE) * 2
This calculates p-value of standardized residuals.