# Chi-Square Test in Python

## GOAL

To write program of chi-square test using python.

## What is chi-square test?

Chi-square test which means “Pearson’s chi-square test” here, is a method of statistical hypothesis testing for goodness-of-fit and independence.

Goodness-of-fit test is the testing to determine whether the observed frequency distribution is the same as the theoretical distribution.
Independence test is the testing to determine whether 2 observations that is represented by 2*2 table, on 2 variables are independent of each other.

Details will be longer. Please see the following sites and document.

## Implementation

The following is implementation for chi-square test.

### Import libraries

```import numpy as np
import pandas as pd
import scipy as sp
from scipy import stats```

### Data preparing

 gourp A group B group C success 23 65 158 failure 100 44 119 success rate 0.187 0.596 0.570

chi_square_data.csv

```A,B,C
23,65,158
100,44,119```

```csv_line = []
with open('chi_square_data.csv', ) as f:
for i in f:
items = i.split(',')
for j in range(len(items)):
if '\n' in items[j]:
items[j] =float(items[j][:-1])
else:
items[j] =float(items[j])
csv_line.append(items)```
```group = csv_line[0]
success = [int(n) for n in csv_line[1]]
failure = [int(n) for n in csv_line[2]]

groups = []
result =[]
count = []
for i in range(len(group)):
groups += [group[i], group[i]] #['A','A', 'B', 'B', 'C', 'C']
result += ['success', 'failure'] #['success', 'failure', 'success', 'failure', 'success', 'failure']
count += [success[i], failure[i]] #[23, 100, 65, 44, 158, 119]

data =  pd.DataFrame({
'groups' : groups,
'result' : result,
'count' : count
})```
```cross_data = pd.pivot_table(
data = data,
values ='count',
aggfunc = 'sum',
index = 'groups',
columns = 'result'
)
print(cross_data)
>>result  failure  success
groups
A           100       23
B            44       65
C           119      158```

### Chi-square test

```print(stats.chi2_contingency(cross_data, correction=False))
>> (57.23616422920877, 3.726703617716424e-13, 2, array([[ 63.554,  59.446],
[ 56.32 ,  52.68 ],
[143.126, 133.874]]))```
• chi2 : 57.23616422920877
• The test statistic
• p : 3.726703617716424e-13
• The p-value of the test
• dof : 2
• Degrees of freedom
• expected : array
• The expected frequencies, based on the marginal sums of the table.

The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.
When statistically significant, that is, p-value is less than 0.05 (typically ≤ 0.05), the difference between groups is significant.