Programming Hadoop in Eclipse (Inverted Index Examples)

It has been two years since I wrote about programming Hadoop in Netbeans using Karmasphere Studio.  Meanwhile, apparently Netbeans is no longer supported by them, and they focused on the other IDE, Eclipse. I have relatively no problem in using Eclipse, thanks to some Android projects that I’m working on right now. In this post, I’ll show you another example of programming Hadoop in Eclipse by implementing distributed inverted index in MapReduce. So, let’s get started, shall we? Continue reading

Setting Up Mercurial for Netbeans Project

In software development, version control system (VCS) hold an important role. Especially when the project is collaborated by many programmers. Besides to keep tracks of changes, version control could helps handle task distribution and later project integration from the programmers. Basically, there are two flavors of version control system: centralized and distributed. There are many comparison between these two flavors on the net, one of them explained it well with some illustrations. The key point between these two systems is there are local working copies of the project in distributed VCS, while in centralized VCS, every changes must be updated to the central repository. Continue reading

Browse Anonymously with Tor

Universities or companies tend to have a very strict Internet access policy. They usually deployed a proxy server to filter and denied access to some websites that they thought could be dangerous to them. Sometimes they just block access to several social networking sites or certain email providers to restrain their network users for wasting their time to do what they called “unproductive things”. When I looked around to find a way to avoid this restriction, I stumbled into this great tools called Tor. Continue reading

Once Upon a Theme

When this blog was created for the first time, I used free themes available on any sites. As time goes by, I want this blog to have some personal touch. So I learned to create my own WordPress theme. I created some themes and install it on this blog. Some people said it was good, but I personally thought that my themes didn’t have a good cross-browsers compatibility (although it worked on modern browsers).

Once again, I want to create personal theme for this blog. Creating theme from scratch is a pain. I must consider a lot of things, like cross-browsers compatibility, new features from WordPress updates, the upcoming HTML standards, and many many more. I decided to look for a WordPress theme frameworks.

Quick googling shows some great WordPress theme frameworks. They have many different interesting features. After I read and tried some of these themes, my eyes are fixed to these theme: Gantry Framework. This theme framework is created by Rocket Theme team for WordPress and Joomla publishing platform. This theme is easy to install, configure, and customize. Besides from that, here are some features of Gantry that I like best.

  • based on 960 grid system, make it flexible to create layout anything you want.
  • widget-based layout, you can put some things in widget and place it anywhere you want.
  • separation between Gantry framework and theme, you could update the framework without breaking your theme design.
  • home-page-post customization, you can create different layout of every page, post, category, etc.
  • it’s free, well it’s good :D

Another great thing about Gantry (and the other theme frameworks), we could customize our design incrementally. Just like what I did with this blog. For now, I used default Gantry style and just customize the header and page background. I have a plan to customize the style further if I have the chance.

I didn’t say that the other frameworks is bad. It was just Gantry get the job done for me. Whether Gantry works for you or not is depends on your needs. You must try and find your own framework that works for you. In the end, happy customizing your theme.. :D

Crunchbang Statler 10: First Look

I was excited when Canonical decided to remodel Ubuntu’s interface with Unity shell. I spent some times installed it and configured it. There are still some bugs in it (at that time), and I decided to try GNOME 3. When I was using GNOME3, I do some works with Hadoop. I use Netbeans and run Hadoop to test out my program. My computer is like screaming when I was testing my Hadoop jobs. Then, I decided to use a minimalistic and lightweight desktop manager, something like LXDE or OpenBox. After spend some times using LXDE in Linux Mint 11 Katya, I stumbled into this minimalistic dark Linux distribution website. The name of the distro is Crunchbang. Continue reading

Running Hadoop Cluster in Netbeans

In the development phase of Hadoop MapReduce program, you will be involved with testing your program on a real cluster with small data to make sure that it’s working correctly. To do that, you must package your application into jar file, then run it with Hadoop jar command on the terminal. Then, you check the output target directory of your program, are the outputs correct? If not, you must delete the output directory in HDFS, check and repair your program, then start the build jar – run Hadoop – check output circle. For once or twice, it’s okay. But in the development process, we will surely make hell a lot of mistakes in our program. Doing the build jar – run Hadoop – check output – delete output directory repeatly could take a lot of time. Not to mention the typo when you interact with Hadoop shell command. To make this testing process easier, we can use Karmasphere: a Hadoop plugin for Netbeans IDE. This article is about how to test your Hadoop program on a real cluster easily using Netbeans. Continue reading

Quickly Switching Hadoop Mode

The Three Modes of Hadoop

As you may already knew, we can configure and use Hadoop in three modes. These modes are:

Standalone mode

This mode is the default mode that you get when you’re downloading and extracting Hadoop for the first time. In this mode, Hadoop didn’t utilize HDFS to store input and output files. Hadoop just use local filesystem in its process. This mode is very useful for debugging your MapReduce code before you deploy it on large cluster and handle huge amounts of data. In this mode, the Hadoop’s configuration file triplet (mapred-site.xml, core-site.xml, hdfs-site.xml) still free from custom configuration.

Pseudo distributed mode (or single node cluster)

In this mode, we configure the configuration triplet to run on a single cluster. The replication factor of HDFS is one, because we only use one node as Master Node, Data Node, Job Tracker, and Task Tracker. We can use this mode to test our code in the real HDFS without the complexity of fully distributed cluster. I’ve already covered the configuration process on my previous post.

Fully distributed mode (or multiple node cluster)

In this mode, we use Hadoop at its full scale. We can use cluster consists of a thousand nodes working together. This is the production phase, where your code and data are used and distributed across many nodes. You use this mode when your code is ready and work properly on the previous mode. Continue reading