1. Execute C++ on Hadoop
Hello in this article / tutorial you will see how to setup and run Hadoop jobs in native C++.
This can be quite tricky but don't worry i will help you through all the steps to get you up and running.
First you need an Hadoop environment to get up and running. I recommend downloading Cloudera's virtual machine that you can run on Virtualbox.
You can go ahead and download the virtual machine here (large download 4.2 gigabyte)
2. Preparing our environment
If you downloaded the VM from the previous chapter. We need to make some preperations to it before you will be able to get MR4C up and running. Please follow the bottom list from top to bottom in order to get everything set up correctly.
Here is the bash commands list:
||First update the system: yum -y update ||Then install the developement tools yum groupinstall "Development Tools" ||Then check gcc: gcc --version ||If the version is < 4.6.3 use the foloving section to install newer, if the version is > 4.6.3 skip this: ||============================================== GCC 4.6.3 ||Resources: ||----- https://ftp.gnu.org/gnu/gmp/gmp-4.3.2.tar.bz2 https://ftp.gnu.org/gnu/mpfr/mpfr-2.4.2.tar.bz2 https://pkgs.fedoraproject.org/repo/pkgs/libmpc/mpc-0.8.1.tar.gz/5b34aa804d514cc295414a963aedb6bf/mpc-0.8.1.tar.gz https://ftp.gnu.org/gnu/gcc/gcc-4.6.3/gcc-4.6.3.tar.bz2 ||----- ||1. build & install gmp: tar jxf gmp-4.3.2.tar.bz2 &&cd gmp-4.3.2/ ./configure --prefix=/usr/local/gmp make &&make install cd .. ||2. build & install mpfr: tar jxf mpfr-2.4.2.tar.bz2 ;cd mpfr-2.4.2/ ./configure --prefix=/usr/local/mpfr -with-gmp=/usr/local/gmp make &&make install cd .. ||3. build & install mpc: tar xzf mpc-0.8.1.tar.gz ;cd mpc-0.8.1 ./configure --prefix=/usr/local/mpc -with-mpfr=/usr/local/mpfr -with-gmp=/usr/local/gmp make &&make install cd .. ||4. build & install gcc4.6.3 tar jxf gcc-4.6.3.tar.bz2 ;cd gcc-4.6.3 ./configure --prefix=/usr/local/gcc -enable-threads=posix -disable-checking -disable-multilib -enable-languages=c,c++ -with-gmp=/usr/local/gmp -with-mpfr=/usr/local/mpfr/ -with-mpc=/usr/local/mpc/ ||Make sure there are no errors and proceed with: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/mpc/lib:/usr/local/gmp/lib:/usr/local/mpfr/lib/ make && make install || Now you need to add configuration, create a new file nano /etc/ld.so.conf.d/gcc.4.6.3.conf || And paste the following into it /usr/local/gcc/lib/ /usr/local/mpc/lib/ /usr/local/gmp/lib/ /usr/local/mpfr/lib/ || Save the file, now run this commands: ldconfig mv /usr/bin/gcc /usr/bin/gcc_old mv /usr/bin/g++ /usr/bin/g++_old mv /usr/bin/c++ /usr/bin/c++_old ln -s -f /usr/local/gcc/bin/gcc /usr/bin/gcc ln -s -f /usr/local/gcc/bin/g++ /usr/bin/g++ ln -s -f /usr/local/gcc/bin/c++ /usr/bin/c++ cp /usr/local/gcc/lib64/libstdc++.so.6.0.16 /usr/lib64/. mv /usr/lib64/libstdc++.so.6 /usr/lib64/libstdc++.so.6.bak ln -s -f /usr/lib64/libstdc++.so.6.0.16 /usr/lib64/libstdc++.so.6 || NOTE: If you get error about the frist command(renaming gcc to gcc_old) it means that the directory is different and you should use /usr/local/bin/ instead of /usr/bin/ . Example: mv /usr/local/bin/gcc /usr/local/bin/gcc_old mv /usr/local/bin/g++ /usr/bin/g++_old mv /usr/local/bin/c++ /usr/local/bin/c++_old ln -s -f /usr/local/gcc/bin/gcc /usr/local/bin/gcc ln -s -f /usr/local/gcc/bin/g++ /usr/local/bin/g++ ln -s -f /usr/local/gcc/bin/c++ /usr/local/bin/c++ cp /usr/local/gcc/lib64/libstdc++.so.6.0.16 /usr/lib64/. mv /usr/lib64/libstdc++.so.6 /usr/lib64/libstdc++.so.6.bak ln -s -f /usr/lib64/libstdc++.so.6.0.16 /usr/lib64/libstdc++.so.6 ||============================================== GCC 4.6.3 || If you run now gcc --version you should get gcc 4.6.3 || Proceed with installing java and other dependencies wget https://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm rpm -ivh epel-release-6-8.noarch.rpm yum install ant yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel yum install cppunit cppunit-devel yum install libpng yum install libtiff yum install git || Install log4cxx from https://www.yolinux.com/TUTORIALS/Log4cxx.html || Install jansson git clone https://github.com/akheron/jansson.git cd ~/jansson autoreconf -i ./configure make make install || Install proj and gdal ftp://ftp.remotesensing.org/proj/proj-4.8.0.tar.gz https://download.osgeo.org/gdal/1.10.0/gdal-1.10.0.tar.gz || Install apache-ivy git clone --recursive https://github.com/apache/ant-ivy cd ant-ivy ant jar || You will have to copy ivy.jar to /root/.ant/lib || After all that is done you can start installing mr4c git clone --recursive https://github.com/google/mr4c cd mr4c ./build_all ./deploy_all cd test make ./test_mr4c.sh || Done. || NOTE: do not copy any line with || , that is a comment, as for the rest, every line is different command you need to execute ==== || If you want to deploy it in ubuntu you can install the dependencies like this apt-get install ant python-software-properties liblog4cxx10 liblog4cxx10-dev build-essential g++ libjansson libjansson-dev libcppunit-dev libcppunit binutils libproj-dev gdal-bin sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer sudo apt-get install oracle-java8-set-default
3. Running our first job!!
Its finally time to get some fun jobs up and running. Please join me in the following tutorial.