Hadoop on Single Node Cluster

29/07/2010

Hello there? S’up?

On my previous post, we’ve learned how to develop Hadoop MapReduce application in Netbeans. After our application run well on the Netbeans, now it’s the time to deploy it on cluster of computers. Well, it supposed to be multi node cluster, but for now, let’s try it on a single node cluster. This article will give a step-by-step guide on how to deploy MapReduce application on a single node cluster.

In this tutorial, I’m using Ubuntu 9.10 Karmic Koala. For the Hadoop MapReduce application, I’ll use the code from my previous post. You can try it by yourself or you can just download the jar file. Are you ready? Let’s go then..

Preparing the Environment

First time first, we must preparing the deploying environment. We must install and configure all the software required. For this process, I followed a great tutorial by Michael Noll about how to run Hadoop on single node cluster. For simplicity, I’ll write a summary of all the steps mentioned on Michael’s post. I do recommend you to read it for the details. Continue reading

Programming Hadoop in Netbeans

23/01/2010

Hadoop MapReduce is an Open Source implementation of MapReduce programming model for processing large scale of data in distributed environment. Hadoop is implemented in Java as a class library. There are some distribution for Hadoop, from Apache, Cloudera, and Yahoo!

Meanwhile, Netbeans is an integrated development environment (or IDE) for programming in Java and many other programming languages. Netbeans (like any other IDE) helps programmer to develop applications easier and as painless as possible with its features. For this case, it helps us to develop Hadoop MapReduce jobs.

In this post, I’ll tell you step-by-step how to use Netbeans to develop a Hadoop MapReduce job. I’m using Netbeans 6.8 in Ubuntu Karmic Koala distribution. The MapReduce program we are going to create here is a simple program called wordcount. This program reads text in some files and lists all the words and how many those words present in all files. The source code of this program is available on the MapReduce tutorials packed with the Apache Hadoop distribution.

We divided this tutorial into three steps. First, we will install Karmasphere Studio for Hadoop, a Netbeans extension. Then, we will type some codes. And finally, we will run the MapReduce job in the Netbeans. Okay, fasten your seat belt.. Here we go.. Continue reading