# Near-Duplicate Detection using MinHash: Background

Written by Taro Sato on . Tagged: math Python stats

There are numerous pieces of duplicate information served by multiple sources on the web. Many news stories that we receive from the media tend to originate from the same source, such as the Associated Press. When such contents are scraped off the web for archiving, a need may arise to categorize documents by their similarity (not in the sense of meaning of the text but the character-level or lexical matching). ... Continue reading.

# A Trick for Computing the Sum of Geometric Series

Written by Taro Sato on . Tagged: math

Say if I need to compute the sum of a series like this one: $$y = 1 + 2 x + 3 x^2 + 4 x^3 + \dots \ , \label{eq:a}$$ where $|x| < 1$. This series looks like a geometric series in which case the sum can be computed from ... Continue reading.

# Half-Light Radii for Various Profiles

Written by Taro Sato on . Tagged: astro math

For a radial profile of $I(r)$, the enclosed flux within the radius $r$ is given by $F(r) = \int_{0}^{2 \pi} d\phi \int_{0}^{r} dr r I(r, \phi) \ .$ I’m only concerned about azimuthal symmetric cases, so $F(r) = 2\pi \int_{0}^{r} dr r I(r)$ . ... Continue reading.