Biboroku

Attribute Access with Dict

Written by Taro Sato on . Tagged: Python

Python dict is useful. The access to a nested item can be tedious, however. For example, data = { "hosts": { "name": "localhost", "cidr": "127.0.0.1/8", } } Here, data["hosts"]["cidir"] would get you "127.0.0.1/8", but all those quotes and brackets can be annoying to type and read. ... Continue reading.

On Lazy Logging Evaluation

Written by Taro Sato on . Tagged: Python

The stdlib logging package in Python encourages the C-style message format string and passing variables as arguments to its log method. For example, logging.debug("Result x = %d, y = %d" % (x, y)) # Bad logging.debug("Result x = %d, y = %d", x, y) # Good or ... Continue reading.

Interpreting A/B Test using Python

Written by Taro Sato on . Tagged: Python stats visualization

Suppose we ran an A/B test with two different versions of a web page, $a$ and $b$, for which we count the number of visitors and whether they convert or not. We can summarize this in a contingency table showing the frequency distribution of the events: ... Continue reading.

Brand Positioning by Correspondence Analysis

Written by Taro Sato on . Tagged: Python stats visualization

I was reading an article about visualization techniques using multidimensional scaling (MDS), the correspondence analysis in particular. The example used R, but as usual I want to find ways to do it on Python, so here goes. The correspondence analysis is useful when you have a two-way contingency table for which relative values of ratio-scaled data are of interest. ... Continue reading.

PCA and Biplot using Python

Written by Taro Sato on . Tagged: Python stats visualization

There are several ways to run principal component analysis (PCA) using various packages (scikit-learn, statsmodels, etc.) or even just rolling out your own through singular-value decomposition and such. Visualizing the PCA result can be done through biplot. I was looking at an example of using prcomp and biplot in R, but it does not seem like there is a comparable plug-and-play way of generating a biplot on Python. ... Continue reading.

Near-Duplicate Detection using MinHash: Background

Written by Taro Sato on . Tagged: math Python stats

There are numerous pieces of duplicate information served by multiple sources on the web. Many news stories that we receive from the media tend to originate from the same source, such as the Associated Press. When such contents are scraped off the web for archiving, a need may arise to categorize documents by their similarity (not in the sense of meaning of the text but the character-level or lexical matching). ... Continue reading.

Customizing & Installing Linux Kernel on Debian Wheezy

Written by Taro Sato on . Tagged: Linux sysadmin

Here is a quickie for customizing and install Linux kernel 3.5.x on Wheezy. Add yourself (with account username) to sudoer group: # adduser username sudo You need to logout and login for this change to take effect. You also need to be able to use sudo or su to install the new kernel in the end. ... Continue reading.

Using Japanese on Debian Wheezy

Written by Taro Sato on . Tagged: sysadmin Linux

The goal is to make the system capable for Japanese input, while letting the base system remain English. For the Japanese input method, I had been using Anthy, but I will be using mozc, which is now better supported and presumably much better (it is). ... Continue reading.

Searching for Nearest-Neighbors between Two Coordinate Catalogs

Written by Taro Sato on . Tagged: astro stats

Say I have two catalogs of points, each in two-dimensional space. For each object in a catalog, I want to find the nearest object(s) in the other catalog. I can do this by computing the distances between every single unique pairs of objects and find the ones within a search radius and possibly doing an additional sort. ... Continue reading.