# Chi-Square Test in Python

## GOAL

To write program of chi-square test using python.

## What is chi-square test?

Chi-square test which means “Pearson’s chi-square test” here, is a method of statistical hypothesis testing for goodness-of-fit and independence.

Goodness-of-fit test is the testing to determine whether the observed frequency distribution is the same as the theoretical distribution.

Independence test is the testing to determine whether 2 observations that is represented by 2*2 table, on 2 variables are independent of each other.

Details will be longer. Please see the following sites and document.

## Implementation

The following is implementation for chi-square test.

### Import libraries

import numpy as np import pandas as pd import scipy as sp from scipy import stats

### Data preparing

gourp A | group B | group C | |

success | 23 | 65 | 158 |

failure | 100 | 44 | 119 |

success rate | 0.187 | 0.596 | 0.570 |

chi_square_data.csv

A,B,C 23,65,158 100,44,119

### Read and Set Data

csv_line = [] with open('chi_square_data.csv', ) as f: for i in f: items = i.split(',') for j in range(len(items)): if '\n' in items[j]: items[j] =float(items[j][:-1]) else: items[j] =float(items[j]) csv_line.append(items)

group = csv_line[0] success = [int(n) for n in csv_line[1]] failure = [int(n) for n in csv_line[2]] groups = [] result =[] count = [] for i in range(len(group)): groups += [group[i], group[i]] #['A','A', 'B', 'B', 'C', 'C'] result += ['success', 'failure'] #['success', 'failure', 'success', 'failure', 'success', 'failure'] count += [success[i], failure[i]] #[23, 100, 65, 44, 158, 119] data = pd.DataFrame({ 'groups' : groups, 'result' : result, 'count' : count })

cross_data = pd.pivot_table( data = data, values ='count', aggfunc = 'sum', index = 'groups', columns = 'result' ) print(cross_data) >>result failure success groups A 100 23 B 44 65 C 119 158

### Chi-square test

print(stats.chi2_contingency(cross_data, correction=False)) >> (57.23616422920877, 3.726703617716424e-13, 2, array([[ 63.554, 59.446], [ 56.32 , 52.68 ], [143.126, 133.874]]))

**chi2**: 57.23616422920877- The test statistic

**p**: 3.726703617716424e-13- The p-value of the test

**dof**: 2- Degrees of freedom

**expected**: array- The expected frequencies, based on the marginal sums of the table.

The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

When statistically significant, that is, p-value is less than 0.05 (typically ≤ 0.05), the difference between groups is significant.