Paired One-way ANOVA And Multiple Comparisons In Python
GOAL
To write program of paired one-way ANOVA(analysis of variance) and multiple comparisons using python. Please refer another article “Unpaired One-way ANOVA And Multiple Comparisons In Python” for unpaired one-way ANOVA.
What is ANOVA
ANOVA(analysis of variance) is a method of statistical hypothesis testing that determines the effects of factors and interactions, which analyzes the differences between group means within a sample.
Details will be longer. Please see the following site.
- Explainer video on Youtube
- A Simple Introduction to ANOVA (Analyticsvidhya)
- Analysis of Variance (Online Statistics Education)
One-way ANOVA is ANOVA test that compares the means of three or more samples. Null hypothesis is that samples in groups were taken from populations with the same mean.
Implementation
The following is implementation example of paired one-way ANOVA.
Import Libraries
Import libraries below for ANOVA test.
import statsmodels.api as sm from statsmodels.formula.api import ols import pandas as pd import numpy as np import statsmodels.stats.anova as anova
Data Preparing
id_1 | id_2 | id_3 | id_4 | id_5 | id_6 | id_7 | |
condition A | 85 | 90 | 88 | 69 | 78 | 98 | 87 |
condition B | 55 | 82 | 67 | 64 | 78 | 54 | 49 |
condition C | 46 | 95 | 59 | 80 | 52 | 73 | 70 |
test_data.csv
85, 90, 88, 69, 78, 98, 87 55, 82, 67, 64, 78, 54, 49 46, 95, 59, 80, 52, 73, 70
Read and Set Data
csv_line = [] with open('test_data.csv', ) as f: for i in f: items = i.split(',') for j in range(len(items)): if '\n' in items[j]: items[j] =float(items[j][:-1]) else: items[j] =float(items[j]) print(items) csv_line.append(items)
groupA = csv_line [0] groupB = csv_line [1] groupC = csv_line [2] tdata = pd.DataFrame({'A':groupA, 'B':groupB, 'C':groupC}) tdata.index = range(1,10) tdata
If you want to display data summary, use DataFrame.describe().
tdata.describe()
ANOVA
subjects=['id1','id2','id3','id4','id5','id6','id7'] points = np.array(groupA +groupB + groupC) conditions = np.repeat(['A','B','C'],len(group0)) subjects = np.array(subjects+subjects+subjects) df = pd.DataFrame({'Point':points,'Conditions':conditions,'Subjects':subjects})
aov=anova.AnovaRM(df, 'Point','Subjects',['Conditions']) result=aov.fit() print(result) >> Anova ======================================== F Value Num DF Den DF Pr > F ---------------------------------------- Conditions 5.4182 2.0000 12.0000 0.0211 ========================================
The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.
When statistically significant, that is, p-value is less than 0.05 (typically ≤ 0.05), perform a multiple comparison. This p value is different between paired ANOVA and unpaired ANOVA.
Tukey’s multiple comparisons
Use pairwise_tukeyhsd(endog, groups, alpha=0.05) for tuky’s HSD(honestly significant difference) test. Argument endog is response variable, array of data (A[0] A[1]… A[6] B[1] … B[6] C[1] … C[6]). Argument groups is list of names(A, A…A, B…B, C…C) that corresponds to response variable. Alpha is significance level.
def tukey_hsd(group_names , *args ): endog = np.hstack(args) groups_list = [] for i in range(len(args)): for j in range(len(args[i])): groups_list.append(group_names[i]) groups = np.array(groups_list) res = pairwise_tukeyhsd(endog, groups) print (res.pvalues) #print only p-value print(res) #print result
print(tukey_hsd(['A', 'B', 'C'], tdata['A'], tdata['B'],tdata['C'])) >> [0.02259466 0.06511251 0.85313142] Multiple Comparison of Means - Tukey HSD, FWER=0.05 ===================================================== group1 group2 meandiff p-adj lower upper reject ----------------------------------------------------- A B -20.8571 0.0226 -38.9533 -2.7609 True A C -17.1429 0.0651 -35.2391 0.9533 False B C 3.7143 0.8531 -14.3819 21.8105 False ----------------------------------------------------- None