PCA and Biplot using Python

Written by Taro Sato on . Tagged: Python stats visualization

There are several ways to run principal component analysis (PCA) using various packages (scikit-learn, statsmodels, etc.) or even just rolling out your own through singular-value decomposition and such. Visualizing the PCA result can be done through biplot. I was looking at an example of using prcomp and biplot in R, but it does not seem like there is a comparable plug-and-play way of generating a biplot on Python.

As it turns out, generating a biplot from the result of PCA by pcasvd of StatsModels is fairly straightforward from the rotation matrix supplied by the function. Here is a code snippet:

In addition to PCA, $k$-means clustering (three clusters) was run on the data to color the observations by how they cluster. The resulting biplot for states.x77 (which I exported and borrowed from R) looks like this:


comments powered by Disqus