Biboroku

Polymorphism with Single Dispatch in Python

Written by taro, on . Tagged: python

Python does not natively support function overloading, a feature commonly used in languages like C++ to create different behaviors based on function signatures. However, in Python, there is a way to achieve a level of polymorphism although in an extremely limited scope. ... Continue reading.

Creating a Debian Boot Partition on a USB Flash Drive

Written by Taro Sato, on . Tagged: linux

Having a boot partition on a USB flash drive provides two major benefits. One is to have the USB flash drive work as a “security key” without which the computer cannot boot into an OS; effectively, the boot USB flash drive becomes a house key. ... Continue reading.

Snipping Transcripts with Snipd

Written by Taro Sato, on . Tagged: snipd podcast pkm

For the past few years, podcasts have served as both a valuable source of learning and inspiration for me. Then, I quickly realized that integrating audio into a personal knowledge management (PKM) workflow is not a simple task. Fortunately, the emergence of new podcast apps is changing this situation. ... Continue reading.

Attribute Access with Dict

Written by Taro Sato, on . Tagged: Python

Python dict is useful. The access to a nested item can be tedious, however. For example, data = { "hosts": { "name": "localhost", "cidr": "127.0.0.1/8", } } Here, data["hosts"]["cidir"] would get you "127.0.0.1/8", but all those quotes and brackets can be annoying to type and read. ... Continue reading.

On Lazy Logging Evaluation

Written by Taro Sato, on . Tagged: Python

The stdlib logging package in Python encourages the C-style message format string and passing variables as arguments to its log method. For example, logging.debug("Result x = %d, y = %d" % (x, y)) # Bad logging.debug("Result x = %d, y = %d", x, y) # Good or ... Continue reading.

Creating a Debian Bootable USB Stick with Non-Free Firmware

Written by Taro Sato, on . Tagged: sysadmin Linux

Debian installation on new hardware can be a hassle when it depends on non-free firmware support. A typical workaround is to use a Debian install image that includes non-free drivers, which is available here: Unofficial non-free images including firmware packages. Choose the right image for the kind of USB media you wish to create. ... Continue reading.

Interpreting A/B Test using Python

Written by Taro Sato, on . Tagged: Python stats visualization

Suppose we ran an A/B test with two different versions of a web page, a and b, for which we count the number of visitors and whether they convert or not. We can summarize this in a contingency table showing the frequency distribution of the events: ... Continue reading.

Brand Positioning by Correspondence Analysis

Written by Taro Sato, on . Tagged: Python stats visualization

I was reading an article about visualization techniques using multidimensional scaling (MDS), the correspondence analysis in particular. The example used R, but as usual, I want to find ways to do it with Python, so here goes. The correspondence analysis is useful when you have a two-way contingency table for which relative values of ratio-scaled data are of interest. ... Continue reading.

PCA and Biplot using Python

Written by Taro Sato, on . Tagged: Python stats visualization

There are several ways to run principal component analysis (PCA) using various packages (scikit-learn, statsmodels, etc.) or even just rolling out your own through singular-value decomposition and such. Visualizing the PCA result can be done through a biplot. I was looking at an example of using prcomp and biplot in R, but it does not seem like there is a comparable plug-and-play way of generating a biplot with Python. ... Continue reading.

Near-Duplicate Detection using MinHash: Background

Written by Taro Sato, on . Tagged: stats Python math

There are numerous pieces of duplicate information served by multiple sources on the web. Many news stories that we receive from the media tend to originate from the same source, such as the Associated Press. When such contents are scraped off the web for archiving, a need may arise to categorize documents by their similarity (not in the sense of the meaning of the text but the character-level or lexical matching). ... Continue reading.