PCA and Biplot using Python
There are several ways to run principal component analysis (PCA)
using various packages (scikit-learn
, statsmodels
, etc.) or even
just rolling out your own through singular-value decomposition and
such. Visualizing the PCA result can be done through biplot. I was
looking at an example of using prcomp
and biplot
in R, but it does
not seem like there is a comparable plug-and-play way of generating a
biplot on Python.
As it turns out, generating a biplot from the result of PCA by
pcasvd
of StatsModels is fairly straightforward from the rotation
matrix supplied by the function. Here is a code snippet:
In addition to PCA, $k$
-means clustering (three clusters) was
run on the data to color the observations by how they cluster. The
resulting biplot for states.x77
(which I exported and borrowed from
R) looks like this: