In the development phase of Hadoop MapReduce program, you will be involved with testing your program on a real cluster with small data to make sure that it’s working correctly. To do that, you must package your application into jar file, then run it with Hadoop jar command on the terminal. Then, you check the output target directory of your program, are the outputs correct? If not, you must delete the output directory in HDFS, check and repair your program, then start the build jar – run Hadoop – check output circle. For once or twice, it’s okay. But in the development process, we will surely make hell a lot of mistakes in our program. Doing the build jar – run Hadoop – check output – delete output directory repeatly could take a lot of time. Not to mention the typo when you interact with Hadoop shell command. To make this testing process easier, we can use Karmasphere: a Hadoop plugin for Netbeans IDE. This article is about how to test your Hadoop program on a real cluster easily using Netbeans.
Before we experiment on how run Hadoop program in cluster using Netbeans, there are some things that we must prepared. First, you must already installed Karmasphere in your Netbeans IDE. If you’re not installed it yet, you can read how to do it on my previous post about programming Hadoop in Netbeans. After you installed Karmasphere, you must also already configured a Hadoop cluster. A single node cluster is enough, the setup just have a tiny differences with the multinode cluster. To configure a single node cluster, you can read my previous post how to do it.
In this article, I’m using Ubuntu 10.10 Maverick Meerkat and Java 6 OpenJDK. Let’s assume that I have a folder called input in my HDFS. So, the path to the input folder is
/user/hadoop/input. Inside this folder, I have a document text with some texts in it. The text is up to you. To create a folder and place a document text there, you can use Hadoop filesystem command called -copyFromLocal. Please refers to my single node cluster post to use it. If you’re ready, let’s get started!
Registering HDFS to Netbeans
We will use Karmasphere plugins in Netbeans to do this, so make sure you already installed it. This is how to do it:
Go to the Service tab on the left side of your IDE. Right click on Hadoop Filesystem and choose New Filesystem...
A window will appear. First, we must choose our Filesystem type. Because we’re using local single node, we’re using Hadoop HDFS Filesystem. You can name the filesystem anything you want. This is my configuration in this step. Click Next when you’re ready.
Next, we configure the HDFS Filesystem. For NameNode host, you can use localhost if you’re using single node cluster. If you’re using multinode cluster, you should use your master hostname. Fill in the NameNode port with your port specified on your Hadoop configuration
core-site.xml. This is my configuration in this step. Click Next when you’re finished.
The next step is configuring Proxy and Firewall. This feature is only available on the Professional Edition of Karmasphere. So we’ll just click Finish.
After you configure the Hadoop Filesystem, your HDFS will appear below the Hadoop Filesystem category. Right click on it and choose Browse. A new tab will open and show you the files and directories inside your HDFS.
Registering Hadoop Cluster in Netbeans
After we successfully adding our HDFS Filesystem, let’s continue to the next step. We will register our Hadoop cluster to Netbeans, so that we can run Hadoop job on our cluster using Netbeans. This is how to do it.
Still on the Services tab, right click on Hadoop Clusters and pick New Cluster…
A window will appear. Enter cluster name to your flavor. On the cluster type, we’ll choose Hadoop Cluster (JobTracker) since we’re using local cluster. Pick Hadoop version that you want to use, and finally enter the default Filesystem with the Filesystem that we already created before. This is my window looks like:
After that, we will configure the remote host. Enter the JobTracker host with your master node host. In single node cluster, the master node host is localhost. In multinode host, fill in the hostname of your master node. Then, enter your JobTracker port as in your Hadoop’s configuration file and the apropriate user name. This is my configuration:
The next configuration needs Karmasphere Professional, so we just can click Finish. The configured cluster will appears in the Hadoop Clusters section.
Running Hadoop Job on Cluster
Now that our Hadoop Filesystem and Cluster are ready, let’s try a Hadoop Job on our Netbeans. In this part, I’ll use Hadoop wordcount example located on hadoop-0.20.2-examples.jar in Hadoop installation directory. Please make sure that your HDFS already have a text document as our input.
On the Services tab, right click on the Hadoop Job and pick New Job..
A window will appear. Enter WordCount as the Job Name and pick Hadoop Job from pre-existing JAR file as the Job Type. This is my window:
Next, on the configure job window, we set the jar file and the main class we want to use. We’re using wordcount example from Hadoop distribution, so find the
hadoop-0.20.2-examples.jar on your Hadoop installation directory. In my case, that’ll be
/usr/local/hadoop/hadoop-0.20.2-examples.jar. Pick the ExampleDriver as the main class. This is my window at this step.
Next, we set the cluster that we want to use and the default argument that will be passed to main method. We pick cluster that we already defined earlier. On the argument field, fill it with:
wordcount input output. This is my window at this step.
Next, we’ll customize the classpath. The default is fine so go to the next configuration. Click Finish if you’re done.
After we configure the Job, let’s run it. Right click on the WordCount Job and pick Run Job... A window will appear. Check your cluster and filesystem that you want to use. If it’s ready, click Run. Netbeans will run your Job on your configured cluster and filesystem. After the Job finished successfully, you can check on your Hadoop Filesystem tab. The output directory will appear in it. You can open the text document in Netbeans. This is how it looks like:
Okay, now you can run your MapReduce Job on Netbeans!! Congratulations.. :D