# Unpaired One-way ANOVA And Multiple Comparisons In Python

## GOAL

To write program of unpaired one-way ANOVA(analysis of variance) and multiple comparisons using python. Please refer another article “Paired One-way ANOVA And Multiple Comparisons In Python” for paired one-way ANOVA.

## What is ANOVA?

ANOVE is is a method of statistical hypothesis testing that determines the effects of factors and interactions, which analyzes the differences between group means within a sample.

Details will be longer. Please see the following site.

- Explainer video on Youtube
- A Simple Introduction to ANOVA (Analyticsvidhya)
- Analysis of Variance (Online Statistics Education)

One-way ANOVA is ANOVA test that compares the means of three or more samples. Null hypothesis is that samples in groups were taken from populations with the same mean.

## Implementation

The following is implementation example of one-way ANOVA.

### Import Libraries

Import libraries below for ANOVA test.

import pandas as pd import numpy as np import scipy as sp import csv # when you need to read csv data from scipy import stats as st import statsmodels.formula.api as smf import statsmodels.api as sm import statsmodels.stats.anova as anova #for ANOVA from statsmodels.stats.multicomp import pairwise_tukeyhsd #for Tukey's multiple comparisons

### Data Preparing

group A | 85 | 90 | 88 | 69 | 78 | 98 | 87 |

group B | 55 | 82 | 67 | 64 | 78 | 54 | 49 |

group C | 46 | 95 | 59 | 80 | 52 | 73 | 70 |

test_data.csv

85, 90, 88, 69, 78, 98, 87 55, 82, 67, 64, 78, 54, 49 46, 95, 59, 80, 52, 73, 70

## Read and Set Data

csv_line = [] with open('test_data.csv', ) as f: for i in f: items = i.split(',') for j in range(len(items)): if '\n' in items[j]: items[j] =float(items[j][:-1]) else: items[j] =float(items[j]) print(items) csv_line.append(items)

groupA = csv_line [0] groupB = csv_line [1] groupC = csv_line [2] tdata = pd.DataFrame({'A':groupA, 'B':groupB, 'C':groupC}) tdata.index = range(1,10) tdata

If you want to display data summary, use DataFrame.describe().

tdata.describe()

### ANOVA

f, p = st.f_oneway(tdata['A'],tdata['B'],tdata['C']) print("F=%f, p-value = %f"%(f,p)) >> F=4.920498, p-value = 0.019737

The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

When statistically significant, that is, p-value is less than 0.05 (typically ≤ 0.05), perform a multiple comparison.

### Tukey’s multiple comparisons

Use pairwise_tukeyhsd(endog, groups, alpha=0.05) for tuky’s HSD(honestly significant difference) test. Argument endog is response variable, array of data (A[0] A[1]… A[6] B[1] … B[6] C[1] … C[6]). Argument groups is list of names(A, A…A, B…B, C…C) that corresponds to response variable. Alpha is significance level.

def tukey_hsd(group_names , *args ): endog = np.hstack(args) groups_list = [] for i in range(len(args)): for j in range(len(args[i])): groups_list.append(group_names[i]) groups = np.array(groups_list) res = pairwise_tukeyhsd(endog, groups) print (res.pvalues) #print only p-value print(res) #print result

print(tukey_hsd(['A', 'B', 'C'], tdata['A'], tdata['B'],tdata['C'])) >>[0.02259466 0.06511251 0.85313142] Multiple Comparison of Means - Tukey HSD, FWER=0.05 ===================================================== group1 group2 meandiff p-adj lower upper reject ----------------------------------------------------- A B -20.8571 0.0226 -38.9533 -2.7609 True A C -17.1429 0.0651 -35.2391 0.9533 False B C 3.7143 0.8531 -14.3819 21.8105 False ----------------------------------------------------- None

## Supplement

If you can’t find ‘pvalue’ key, check the version of statsmodels.

import statsmodels statsmodels.__version__ >> 0.9.0

If the version is lower than 0.10.0, update statsmodels. Open command prompt or terminal and input the command below.

pip install --upgrade statsmodels # or pip3 install --upgrade statsmodels