Software Development

  • Python
    • Data science framework development with PyData (numpy, scipy, pandas, matplotlib)
    • Bottleneck removal using cython, SWIG, C-extension
  • Low-level performance optimization
    • C & C++
  • SQL & NoSQL
    • MySQL, CouchDB, Redis, Memcached
  • Message queuing system
    • AMQP with RabbitMQ
  • Concurrent computing
    • Multithreading and mutiprocessing with gevent and multiprocessing in Python
  • LAMP stack web development
    • Linux, Apache, MySQL, Python/PHP, JavaScript, HTML/CSS.

Cloud & Distributed Computing

  • MapReduce
    • Hadoop
    • mrjob for using Hadoop on Python
    • Mahout for machine learning
  • Amazon Web Services
    • Elastic MapReducde (EMR), Simple Storage Service (S3), Elastic Compute Cloud (EC2)
  • High-performance computing
    • OpenMPI

Data Analysis & Statistics

  • General statistical methods
    • Hypothesis testing: t-test, chi-squared test, Kolmogorov-Smirnov test
    • Regression analysis: simple and multiple linear regressions, non-linear least-squares (Levenberg-Marquardt)
    • Principal component analysis (PCA)
  • Bayesian methods
    • MCMC
  • Machine learning
    • Classification: logistic regression, Naive Bayes
    • Clustering: k-means, hierarchical
    • Feature engineering
  • Natural language processing
    • n-gram models
  • Digital signal processing
    • Electromagnetic spectroscopy
    • 2D image processing
  • Data visualization
    • Multidimensional scaling (MDS), correspondence analysis

Documentation, Translation, & Localization

  • English or Japanese as the target language
  • Technical documentation