It has been two years since I wrote about programming Hadoop in Netbeans using Karmasphere Studio. Meanwhile, apparently Netbeans is no longer supported by them, and they focused on the other IDE, Eclipse. I have relatively no problem in using Eclipse, thanks to some Android projects that I’m working on right now. In this post, I’ll show you another example of programming Hadoop in Eclipse by implementing distributed inverted index in MapReduce. So, let’s get started, shall we? Continue reading
In software development, version control system (VCS) hold an important role. Especially when the project is collaborated by many programmers. Besides to keep tracks of changes, version control could helps handle task distribution and later project integration from the programmers. Basically, there are two flavors of version control system: centralized and distributed. There are many comparison between these two flavors on the net, one of them explained it well with some illustrations. The key point between these two systems is there are local working copies of the project in distributed VCS, while in centralized VCS, every changes must be updated to the central repository. Continue reading
Some days ago, there’s a vacancy offer in my undergraduate department mailing list. A company is looking for a programmer. I didn’t pay much attention to this email. Okay, here’s the email:
Mr. XXX, my office needs a programmer with this qualification:
- Have knowledge in VB, Java, and PHP
- Have any experiences as a programmer/developer for at least 1 year in IT division or in IT company or software house
- Have an ability to give product presentation to potential clients
- Have knowledge in CorelDRAW and Photoshop
- Have knowledge in Linux
- Have knowledge in building computer networks
- Have knowledge in hardware
Note: It seems that Netbeans is no longer supported by Karmasphere Studio. For programming Hadoop in Eclipse, you could read it here.
Hadoop MapReduce is an Open Source implementation of MapReduce programming model for processing large scale of data in distributed environment. Hadoop is implemented in Java as a class library. There are some distribution for Hadoop, from Apache, Cloudera, and Yahoo!
Meanwhile, Netbeans is an integrated development environment (or IDE) for programming in Java and many other programming languages. Netbeans (like any other IDE) helps programmer to develop applications easier and as painless as possible with its features. For this case, it helps us to develop Hadoop MapReduce jobs.
In this post, I’ll tell you step-by-step how to use Netbeans to develop a Hadoop MapReduce job. I’m using Netbeans 6.8 in Ubuntu Karmic Koala distribution. The MapReduce program we are going to create here is a simple program called wordcount. This program reads text in some files and lists all the words and how many those words present in all files. The source code of this program is available on the MapReduce tutorials packed with the Apache Hadoop distribution.
We divided this tutorial into three steps. First, we will install Karmasphere Studio for Hadoop, a Netbeans extension. Then, we will type some codes. And finally, we will run the MapReduce job in the Netbeans. Okay, fasten your seat belt.. Here we go.. Continue reading
Not so long ago, I created a GUI for data storage using Java and of course JDBC. I followed a tutorial about how to insert database records to the database table. The tutorial said that before we’re inserting new record to the database, we should generate unique key as record identity. The generation process handled by Java code. The tutorial use
System.currentTimeMilis() function to create the key. So, the code will be look like this:
//set new id Number time = System.currentTimeMillis(); Integer id = (time.intValue()/10000);
The generated key then being inserted to the database.
As you may realized, database systems usually have their own key generation technique. For example
AUTO_INCREMENT attribute in MySQL. I followed the tutorial and sure it works. But, I tried database-generation key, and it works too. So, why we should bother generate our own keys?
The O’Reilly Java Author gave an explanation about that. They said:
However, using the supported key generation tools of your database of choice presents several problems:
- Every database engine handles key generation differently. Thus, it is difficult to build a truly portable JDBC application that uses proprietary key generation schemes.
- Until JDBC 3.0, a Java application had no clear way of finding out which keys were generated on an insert.
- Automated key generation wreaks havoc with EJBs.
I got the point that the application should be portable. Who knows that someday we’ll migrating from one database systems to another? So the database generation key will be hard to control.
Another detailed answer came from Scott Selikoff. He wrote a complete article about database key generation and gave an example:
Now, let’s say a user is in the process of creating a new record in your system. For each user record, you also have a set of postal addresses. For example, Bob may be purchasing items on NewEgg and have a home address and a work address. Furthermore, Bob enters his two addresses at the time he creates his account, so the application server receives the information to create all 3 records at once. In such a situation, you would normally have 3 records: 1 user record for Bob and 2 address records. You could add the address info in the user table, although then you have to restrict the number of addresses Bob can have and/or have a user table with a lot of extra columns.
Inserting Bob into the user table is straight forward enough, but there is a problem when you go to insert users into the address table, namely that you need Bob’s newly generated User Id in order to insert any records into the address table. After all, you can’t insert addresses without being connected to a specific user, lest chaos ensue in your data management system.
The problems will arise in complex database relationship. As we don’t know the generated key, we couldn’t set the foreign key of corresponding tables. I didn’t realize this because the GUI I made was using a simple database design. Maybe with no table relation at all.
So, I got my question answered. Do you have another answer for my question? Feel free to share.