Installing Hadoop-LZO on Debian Jessie

JDK 1.6 or later is need to be already installed for this to work.

Install a few packages:

$ aptitude install liblzo2-dev maven git

To see where the LZO library is installed, get a list of files installed:

$ dpkg-query -L liblzo2-dev
... list of paths ...

In my box, the include and library paths are /usr/include/lzo and /usr/lib/x86_64-linux-gnu, respectively. (These paths should be recognized without doing anything, but if specifically pointing to them is necessary upon build with mvn later, try:

$ C_INCLUDE_PATH=/usr/include/lzo \
LIBRARY_PATH=/usr/lib/x86_64-linux-gnu \
  mvn clean test

for example.)

Get the source from the Hadoop-LZO github repo:

$ git clone https://github.com/twitter/hadoop-lzo.git
$ cd hadoop-lzo
$ mvn clean test package

If the build is successful, the JAR should be found under target:

$ ls target/
...
hadoop-lzo-0.4.20-SNAPSHOT.jar
...

Typically, the JAR thus created should be installed in $HADOOP_HOME/share/hadoop/common/lib, and the following properties need to be added to configuration files under $HADOOP_HOME/etc/hadoop.

In core-site.xml:

<configuration>

  ... some other properties ...

  <property>
    <name>io.compression.codecs</name>
    <value>                                                                                                                                                                        
      org.apache.hadoop.io.compress.DefaultCodec,                                                                                                                                  
      org.apache.hadoop.io.compress.GzipCodec,                                                                                                                                     
      org.apache.hadoop.io.compress.BZip2Codec,                                                                                                                                    
      org.apache.hadoop.io.compress.DeflateCodec,                                                                                                                                  
      org.apache.hadoop.io.compress.SnappyCodec,                                                                                                                                   
      org.apache.hadoop.io.compress.Lz4Codec,                                                                                                                                      
      com.hadoop.compression.lzo.LzoCodec,                                                                                                                                         
      com.hadoop.compression.lzo.LzopCodec                                                                                                                                         
    </value>
  </property>

  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

</configuration>

In mapred-site.xml:

<configuration>

  ... some other properties ...

  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

</configuration>
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *