# Paired One-way ANOVA And Multiple Comparisons In Python

## GOAL

To write program of paired one-way ANOVA(analysis of variance) and multiple comparisons using python. Please refer another article “Unpaired One-way ANOVA And Multiple Comparisons In Python” for unpaired one-way ANOVA.

## What is ANOVA

ANOVA(analysis of variance) is a method of statistical hypothesis testing that determines the effects of factors and interactions, which analyzes the differences between group means within a sample.
Details will be longer. Please see the following site.

One-way ANOVA is ANOVA test that compares the means of three or more samples. Null hypothesis is that samples in groups were taken from populations with the same mean.

## Implementation

The following is implementation example of paired one-way ANOVA.

### Import Libraries

Import libraries below for ANOVA test.

```import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd
import numpy as np
import statsmodels.stats.anova as anova```

### Data Preparing

 id_1 id_2 id_3 id_4 id_5 id_6 id_7 condition A 85 90 88 69 78 98 87 condition B 55 82 67 64 78 54 49 condition C 46 95 59 80 52 73 70

test_data.csv

```85, 90, 88, 69, 78, 98, 87
55, 82, 67, 64, 78, 54, 49
46, 95, 59, 80, 52, 73, 70```

```csv_line = []
with open('test_data.csv', ) as f:
for i in f:
items = i.split(',')
for j in range(len(items)):
if '\n' in items[j]:
items[j] =float(items[j][:-1])
else:
items[j] =float(items[j])
print(items)
csv_line.append(items)```
```groupA = csv_line 
groupB = csv_line 
groupC = csv_line 
tdata = pd.DataFrame({'A':groupA, 'B':groupB, 'C':groupC})
tdata.index = range(1,10)
tdata ```

If you want to display data summary, use DataFrame.describe().

`tdata.describe()`

### ANOVA

```subjects=['id1','id2','id3','id4','id5','id6','id7']
points = np.array(groupA +groupB + groupC)
conditions = np.repeat(['A','B','C'],len(group0))
subjects = np.array(subjects+subjects+subjects)
df = pd.DataFrame({'Point':points,'Conditions':conditions,'Subjects':subjects})```
```aov=anova.AnovaRM(df, 'Point','Subjects',['Conditions'])
result=aov.fit()

print(result)

>>                 Anova
========================================
F Value Num DF  Den DF Pr > F
----------------------------------------
Conditions  5.4182 2.0000 12.0000 0.0211
========================================```

The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.
When statistically significant, that is, p-value is less than 0.05 (typically ≤ 0.05), perform a multiple comparison. This p value is different between paired ANOVA and unpaired ANOVA.

### Tukey’s multiple comparisons

Use pairwise_tukeyhsd(endog, groups, alpha=0.05) for tuky’s HSD(honestly significant difference) test. Argument endog is response variable, array of data (A A… A B … B C … C). Argument groups is list of names(A, A…A, B…B, C…C) that corresponds to response variable. Alpha is significance level.

```def tukey_hsd(group_names , *args ):
endog = np.hstack(args)
groups_list = []
for i in range(len(args)):
for j in range(len(args[i])):
groups_list.append(group_names[i])
groups = np.array(groups_list)
res = pairwise_tukeyhsd(endog, groups)
print (res.pvalues) #print only p-value
print(res) #print result```
```print(tukey_hsd(['A', 'B', 'C'], tdata['A'], tdata['B'],tdata['C']))
>> [0.02259466 0.06511251 0.85313142]
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=====================================================
group1 group2 meandiff p-adj   lower    upper  reject
-----------------------------------------------------
A      B -20.8571 0.0226 -38.9533 -2.7609   True
A      C -17.1429 0.0651 -35.2391  0.9533  False
B      C   3.7143 0.8531 -14.3819 21.8105  False
-----------------------------------------------------
None```