Running Hadoop Cluster in Netbeans

In the development phase of Hadoop MapReduce program, you will be involved with testing your program on a real cluster with small data to make sure that it’s working correctly. To do that, you must package your application into jar file, then run it with Hadoop jar command on the terminal. Then, you check the output target directory of your program, are the outputs correct? If not, you must delete the output directory in HDFS, check and repair your program, then start the build jar – run Hadoop – check output circle. For once or twice, it’s okay. But in the development process, we will surely make hell a lot of mistakes in our program. Doing the build jar – run Hadoop – check output – delete output directory repeatly could take a lot of time. Not to mention the typo when you interact with Hadoop shell command. To make this testing process easier, we can use Karmasphere: a Hadoop plugin for Netbeans IDE. This article is about how to test your Hadoop program on a real cluster easily using Netbeans.

Prerequisites

Before we experiment on how run Hadoop program in cluster using Netbeans, there are some things that we must prepared. First, you must already installed Karmasphere in your Netbeans IDE. If you’re not installed it yet, you can read how to do it on my previous post about programming Hadoop in Netbeans. After you installed Karmasphere, you must also already configured a Hadoop cluster. A single node cluster is enough, the setup just have a tiny differences with the multinode cluster. To configure a single node cluster, you can read my previous post how to do it.

In this article, I’m using Ubuntu 10.10 Maverick Meerkat and Java 6 OpenJDK. Let’s assume that I have a folder called input in my HDFS. So, the path to the input folder is /user/hadoop/input. Inside this folder, I have a document text with some texts in it. The text is up to you. To create a folder and place a document text there, you can use Hadoop filesystem command called -copyFromLocal. Please refers to my single node cluster post to use it. If you’re ready, let’s get started!

Registering HDFS to Netbeans

We will use Karmasphere plugins in Netbeans to do this, so make sure you already installed it. This is how to do it:

Go to the Service tab on the left side of your IDE. Right click on Hadoop Filesystem and choose New Filesystem...

A window will appear. First, we must choose our Filesystem type. Because we’re using local single node, we’re using Hadoop HDFS Filesystem. You can name the filesystem anything you want. This is my configuration in this step. Click Next when you’re ready.

Next, we configure the HDFS Filesystem. For NameNode host, you can use localhost if you’re using single node cluster. If you’re using multinode cluster, you should use your master hostname. Fill in the NameNode port with your port specified on your Hadoop configuration core-site.xml. This is my configuration in this step. Click Next when you’re finished.

The next step is configuring Proxy and Firewall. This feature is only available on the Professional Edition of Karmasphere. So we’ll just click Finish.

After you configure the Hadoop Filesystem, your HDFS will appear below the Hadoop Filesystem category. Right click on it and choose Browse. A new tab will open and show you the files and directories inside your HDFS.

Registering Hadoop Cluster in Netbeans

After we successfully adding our HDFS Filesystem, let’s continue to the next step. We will register our Hadoop cluster to Netbeans, so that we can run Hadoop job on our cluster using Netbeans. This is how to do it.

Still on the Services tab, right click on Hadoop Clusters and pick New Cluster…

A window will appear. Enter cluster name to your flavor. On the cluster type, we’ll choose Hadoop Cluster (JobTracker) since we’re using local cluster. Pick Hadoop version that you want to use, and finally enter the default Filesystem with the Filesystem that we already created before. This is my window looks like:

After that, we will configure the remote host. Enter the JobTracker host with your master node host. In single node cluster, the master node host is localhost. In multinode host, fill in the hostname of your master node. Then, enter your JobTracker port as in your Hadoop’s configuration file and the apropriate user name. This is my configuration:

The next configuration needs Karmasphere Professional, so we just can click Finish. The configured cluster will appears in the Hadoop Clusters section.

Running Hadoop Job on Cluster

Now that our Hadoop Filesystem and Cluster are ready, let’s try a Hadoop Job on our Netbeans. In this part, I’ll use Hadoop wordcount example located on hadoop-0.20.2-examples.jar in Hadoop installation directory. Please make sure that your HDFS already have a text document as our input.

On the Services tab, right click on the Hadoop Job and pick New Job..

A window will appear. Enter WordCount as the Job Name and pick Hadoop Job from pre-existing JAR file as the Job Type. This is my window:

Next, on the configure job window, we set the jar file and the main class we want to use. We’re using wordcount example from Hadoop distribution, so find the hadoop-0.20.2-examples.jar on your Hadoop installation directory. In my case, that’ll be /usr/local/hadoop/hadoop-0.20.2-examples.jar. Pick the ExampleDriver as the main class. This is my window at this step.

Next, we set the cluster that we want to use and the default argument that will be passed to main method. We pick cluster that we already defined earlier. On the argument field, fill it with: wordcount input output. This is my window at this step.

Next, we’ll customize the classpath. The default is fine so go to the next configuration. Click Finish if you’re done.

After we configure the Job, let’s run it. Right click on the WordCount Job and pick Run Job... A window will appear. Check your cluster and filesystem that you want to use. If it’s ready, click Run. Netbeans will run your Job on your configured cluster and filesystem. After the Job finished successfully, you can check on your Hadoop Filesystem tab. The output directory will appear in it. You can open the text document in Netbeans. This is how it looks like:

Okay, now you can run your MapReduce Job on Netbeans!! Congratulations.. :D

9 thoughts on “Running Hadoop Cluster in Netbeans

  1. Hi! This is a great tutorial. I tried it with Netbeans 7.0rc1 and works just great. Modulo the graphical glitches (I assume they're from netbeans 7). Question: do you know why Karmasphere keeps sending the hadoop-site.xml which results in the warning that the use of it is deprecated? I am using Hadoop 0.20.2 and do not have that file on my cluster so it must come from this plugin.

    1. yeah.. I'm also always got that message too… maybe it comes from the library included in the plugin..
      glad you like my post.. :D

  2. hello aifn sir,
    i am new to hadoop and i want to ask how i can work with netbeans.i all ready download all plugin as per you last tutorial but it’s say’s it’s good for unix operating system and i am using window..so plese help me out.”sanghanialpesh@gmail.com

    1. well.. I have no experience using it on Windows. I suggest you to check on Karmasphere site for more information about how to make it running on Windows.
      But if you insisted, you can install Linux on your Windows using virtual machine such as VirtualBox and playing with Hadoop there..
      Good luck!

  3. hello aifn sir,

    i am new to mapreduce field,i have problem in registering hdfs in netbean,i dont know how to add the files inside the hdfs which we are creating,,,,pls help me out. agilashekar@gmail.com

  4. hi mr Arif.
    why did you use, wordcunt in DefaultArguments?
    the Main Class should be org.apache.hadoop.examples.ExampleDriver???
    or ExampleDriver??

    i gor this error:”org.apache.hadoop.mapreduce.lib.input.InvaliInputException: Input path does not exist: hdfs://localhost:54310/user/sina/in”

  5. I forgot, but I think I use org.apache.hadoop.examples.ExampleDriver as the main class. In ExampleDriver class there are some examples of process created using MapReduce. One of them is wordcount that can be activated using wordcount as the arguments.

    The error showed that it can’t find your input path. So, check the input path or create it if it doesn’t exist yet..

    Good luck :D

  6. Great information and thank you very much for this tutorial as i couldn’t find from other which is free information like here. Again thank you and i hope other peoples that meet same problem can find this one as right and easy to understand.
    Best all

What's in your mind?