Hadoop MapReduce is an Open Source implementation of MapReduce programming model for processing large scale of data in distributed environment. Hadoop is implemented in Java as a class library. There are some distribution for Hadoop, from Apache, Cloudera, and Yahoo!
Meanwhile, Netbeans is an integrated development environment (or IDE) for programming in Java and many other programming languages. Netbeans (like any other IDE) helps programmer to develop applications easier and as painless as possible with its features. For this case, it helps us to develop Hadoop MapReduce jobs.
In this post, I’ll tell you step-by-step how to use Netbeans to develop a Hadoop MapReduce job. I’m using Netbeans 6.8 in Ubuntu Karmic Koala distribution. The MapReduce program we are going to create here is a simple program called wordcount. This program reads text in some files and lists all the words and how many those words present in all files. The source code of this program is available on the MapReduce tutorials packed with the Apache Hadoop distribution.
We divided this tutorial into three steps. First, we will install Karmasphere Studio for Hadoop, a Netbeans extension. Then, we will type some codes. And finally, we will run the MapReduce job in the Netbeans. Okay, fasten your seat belt.. Here we go.. Continue reading
In my college’s department mailing list, there is an interesting discussion about the quality of IT bachelor degree in the workplace. There are some reasons behind that:
- The bachelor graduate worker lacking practical skills. They can not answer a fundamental question that every IT or computer science graduate should know.
- The bachelor graduate worker also lacking soft skills, like how to speak with the higher-ups and communicate with another workers.
As a result, the companies prefer to hire a vocational IT graduate. Why?
- A vocational graduate sometimes have the practical skills that a bachelor graduate didn’t have. Computer science or IT is a wide spread knowledge. It means you didn’t have to go to the college just the learn how to program. It’s all over the clouds. So the learning materials are reachable to everyone.
- Vocational graduates are easier to manage. Some of them have more respect to the higher-ups than the bachelor graduates.
- The standard salary for the vocational graduates is less expensive than the bachelor graduates. Combine this factor with better skills and higher respect means that bachelor graduates’s job are in a grave danger.
Continue reading
Howdy,
When I was in my college, I tried to implement Web Map Service (WMS) and Web Feature Service (WFS) as a foundation for a distributed Geographical Information Systems (or better known as GIS). My academic advisor at that time told me that this idea is not entirely new, but there are still a lot of people didn’t know about it yet. So with this topic as my thesis, he wished that one day people will know about this technology.
The implementation that I made was quite simple actually. But let me tell you the complete story. At first, I was thinking about develop a geographical operation that can be operated via web service in the clouds. After some weeks of analyzing and gathering informations, I found out that this work could be really hard and time consuming. I didn’t have background in geography–I’m a computer science student–and I didn’t have much time before the next graduation. Finally, I just created a spatial data repository and make it accessible across the network using GeoServer,an Open Source implementation of WMS and WFS. I, then, created a simple web application to pull the spatial data and display it to the browser. I also provided a simple data update feature, utilizing one of the feature of WFS. I used OpenLayers to create the application. It’s really simple actually.

In my graduate study, right now, I want to try something entirely different. I want to explore MapReduce, a programming model for processing a large scale of data in a distributed environment. I heard about this model from some mailing lists and websites, surprised that the paper [pdf], the lecture notes and videos are easy to get. So, for the time being, I decided to do some experiments in order to learn something about it.
It’s still a plan in my head actually. I never talked about it to my thesis advisor (because I have none yet). But I can predict some problems that I will be dealing with if I do this research plan. They are:
- The case. I don’t have any idea about the case that I should solve with this research. My college’s advisor told me about doing something in bioinformatics like genome assembling. I think I will cosider it. But I’m open for an idea.
- The machine and its network. The lab are always busy with the other graduate student. Fortunately, one friend of mine told me that there is another place that I can use in the campus to do experiments. But I should create a permission letter first. Okay, I’ll do it.
In the mean time, I’ll focus myself to learn about MapReduce. Maybe I’ll post something about it in this blog. If you have a suggestion about what should I do with this programming model, let me know. I’d be really glad to hear it.
Credits:
EDSAC pictures, copyrighted Computer Laboratory, University of Cambridge, licensed under the Creative Commons Attribution 2.0 Generic license.
Howdy,
It has been a long time since my last post in this blog. I just wrote some posts in my old blog when suddenly it’s already 2010. Wow, time really waits for no one. So in this post I wanna sum up what has been happening in the last year. Well, I don’t actually do anything big actually, but I made some progress in live, I guess.
Here I come. I spent my first months in 2009 in my homeland. I was recovering from my illness and successfully gained some weights. I did some theme for this blog, exploring with domain and paid-hosting, maintaining my virtual social live in Facebook and Twitter. Anything that can be done in the house, I’ve done it. Oh yeah, I also created a simple blog aggregator or usually called planet to lists all of my class’s blog posting. Continue reading