# Unpaired One-way ANOVA And Multiple Comparisons In Python

## GOAL

To write program of unpaired one-way ANOVA(analysis of variance) and multiple comparisons using python. Please refer another article “Paired One-way ANOVA And Multiple Comparisons In Python” for paired one-way ANOVA.

## What is ANOVA?

ANOVE is is a method of statistical hypothesis testing that determines the effects of factors and interactions, which analyzes the differences between group means within a sample.
Details will be longer. Please see the following site.

One-way ANOVA is ANOVA test that compares the means of three or more samples. Null hypothesis is that samples in groups were taken from populations with the same mean.

## Implementation

The following is implementation example of one-way ANOVA.

### Import Libraries

Import libraries below for ANOVA test.

```import pandas as pd
import numpy as np
import scipy as sp
import csv # when you need to read csv data
from scipy import stats as st

import statsmodels.formula.api as smf
import statsmodels.api as sm
import statsmodels.stats.anova as anova #for ANOVA
from statsmodels.stats.multicomp import pairwise_tukeyhsd #for Tukey's multiple comparisons```

### Data Preparing

 group A 85 90 88 69 78 98 87 group B 55 82 67 64 78 54 49 group C 46 95 59 80 52 73 70

test_data.csv

```85, 90, 88, 69, 78, 98, 87
55, 82, 67, 64, 78, 54, 49
46, 95, 59, 80, 52, 73, 70```

```csv_line = []
with open('test_data.csv', ) as f:
for i in f:
items = i.split(',')
for j in range(len(items)):
if '\n' in items[j]:
items[j] =float(items[j][:-1])
else:
items[j] =float(items[j])
print(items)
csv_line.append(items)```
```groupA = csv_line [0]
groupB = csv_line [1]
groupC = csv_line [2]

tdata = pd.DataFrame({'A':groupA, 'B':groupB, 'C':groupC})
tdata.index = range(1,10)
tdata ```

If you want to display data summary, use DataFrame.describe().

`tdata.describe()`

### ANOVA

```f, p = st.f_oneway(tdata['A'],tdata['B'],tdata['C'])
print("F=%f, p-value = %f"%(f,p))

>> F=4.920498, p-value = 0.019737```

The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.
When statistically significant, that is, p-value is less than 0.05 (typically ≤ 0.05), perform a multiple comparison.

### Tukey’s multiple comparisons

Use pairwise_tukeyhsd(endog, groups, alpha=0.05) for tuky’s HSD(honestly significant difference) test. Argument endog is response variable, array of data (A[0] A[1]… A[6] B[1] … B[6] C[1] … C[6]). Argument groups is list of names(A, A…A, B…B, C…C) that corresponds to response variable. Alpha is significance level.

```def tukey_hsd(group_names , *args ):
endog = np.hstack(args)
groups_list = []
for i in range(len(args)):
for j in range(len(args[i])):
groups_list.append(group_names[i])
groups = np.array(groups_list)
res = pairwise_tukeyhsd(endog, groups)
print (res.pvalues) #print only p-value
print(res) #print result```
```print(tukey_hsd(['A', 'B', 'C'], tdata['A'], tdata['B'],tdata['C']))
>>[0.02259466 0.06511251 0.85313142]
Multiple Comparison of Means - Tukey HSD, FWER=0.05
=====================================================
group1 group2 meandiff p-adj   lower    upper  reject
-----------------------------------------------------
A      B -20.8571 0.0226 -38.9533 -2.7609   True
A      C -17.1429 0.0651 -35.2391  0.9533  False
B      C   3.7143 0.8531 -14.3819 21.8105  False
-----------------------------------------------------
None```

## Supplement

If you can’t find ‘pvalue’ key, check the version of statsmodels.

```import statsmodels
statsmodels.__version__
>> 0.9.0```

If the version is lower than 0.10.0, update statsmodels. Open command prompt or terminal and input the command below.

```pip install --upgrade statsmodels
# or pip3 install --upgrade statsmodels```