Interpreting A/B Test using Python
Suppose we ran an A/B test with two different versions of a web page,
Not converted ( |
Converted ( |
|
---|---|---|
It is trivial to compute the conversion rate of each version,
Background
An appropriate hypothesis test here is Pearson’s chi-squared test. There are two types of the chi-squared test, the goodness of fit and the test of independence, but it is the latter that is useful for the case in question. The reason why a test of “independence” is applicable becomes clear by converting the contingency table into a probability matrix by dividing each element by the grand total of frequencies:
Not converted ( |
Converted ( |
|
---|---|---|
A table like this is sometimes called a correspondence matrix. Here, the table consists of joint probabilities where
Now, our interest is whether the conversion
where
Not converted ( |
Converted ( |
|
---|---|---|
The conversion
The chi-squared test compares an observed distribution
where
Python Implementation
Fortunately, it is very straightforward to carry out this hypothesis testing using scipy.stats.chi2_contingency
. All we need is to supply the function with a contingency matrix and it will return the
#!/usr/bin/env python2.7
# -*- coding: utf-8 -*-
"""An example of A/B test using the chi-squared test for independence."""
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
def main():
data = pd.io.parsers.read_csv('n10000.csv')
data = data.set_index('version')
observed = data.values
print observed
result = chi2_contingency(observed)
chisq, p = result[:2]
print 'chisq = {}, p = {}'.format(chisq, p)
print
data = pd.io.parsers.read_csv('n40000.csv')
data = data.set_index('version')
observed = data.values
print observed
result = chi2_contingency(observed)
chisq, p = result[:2]
print 'chisq = {}, p = {}'.format(chisq, p)
if __name__ == '__main__':
main()
n10000.csv:
version | not converted | converted |
---|---|---|
A | 4514 | 486 |
B | 4473 | 527 |
n40000.csv:
version | not converted | converted |
---|---|---|
A | 17998 | 2002 |
B | 17742 | 2258 |
(The code and data are available in Gist.)
The result for the original table (of
What if we keep running the same A/B test a bit longer until we accumulate
-
For a
contingency table, Yate’s chi-squared test is commonly used. This applies a correction of the formto account for an error between the observed discrete distribution and the continuous chi-squared distribution. ↩︎