Note: It seems that Netbeans is no longer supported by Karmasphere Studio. For programming Hadoop in Eclipse, you could read it here.

Hadoop MapReduce is an Open Source implementation of MapReduce programming model for processing large scale of data in distributed environment. Hadoop is implemented in Java as a class library. There are some distribution for Hadoop, from Apache, Cloudera, and Yahoo!

Meanwhile, Netbeans is an integrated development environment (or IDE) for programming in Java and many other programming languages. Netbeans (like any other IDE) helps programmer to develop applications easier and as painless as possible with its features. For this case, it helps us to develop Hadoop MapReduce jobs.

In this post, I’ll tell you step-by-step how to use Netbeans to develop a Hadoop MapReduce job. I’m using Netbeans 6.8 in Ubuntu Karmic Koala distribution. The MapReduce program we are going to create here is a simple program called wordcount. This program reads text in some files and lists all the words and how many those words present in all files. The source code of this program is available on the MapReduce tutorials packed with the Apache Hadoop distribution.

We divided this tutorial into three steps. First, we will install Karmasphere Studio for Hadoop, a Netbeans extension. Then, we will type some codes. And finally, we will run the MapReduce job in the Netbeans. Okay, fasten your seat belt.. Here we go..

 

Install Karmasphere Studio for Hadoop

In order to do this, you must already installed JDK 1.6 and Netbeans (of course). There is a nifty tutorial with pictures about how to install the Karmasphere Studio for Hadoop on their site, but I’ll write it again here.

  1. Open your Netbeans, go to Update Center using Tools > Plugins.
  2. In the Update Center, go to Settingstab and click the Add button. Enter the following Name and URL in the Update Center Customizer window:Name: Karmasphere Studio for HadoopURL: http://hadoopstudio.org/updates/updates.xml
  3. Now, select the Available Plugins tab. Find the “Karmasphere Studio for Hadoop” in the list and check it. Then click the Install button.
  4. Click Next and accept the license agreement. Click Install for list of will be installed plugins. Then, click Continue to download and install the plugins. The plugins size is about 20-something MB (I forgot). Wait for it and when it’s finished, restart your IDE.
  5. Done, we are good to go.

Typing some codes

Now, we are going to type some codes for wordcount program. To do this you must restart your IDE after the plugins installation. If you haven’t do it, then do it now, I’ll wait. Done it? Okay, let’s continue.

  1. We need to create a new Java application. To do that, go to File > New Project. Pick Java Application project and click Next.
  2. In the next window, give WordCount as the name of the project. Then type WordCount as the Main Class. When you’re done, click Finish.
  3. Okay, the editor for WordCount.java is now open. But first, we must added the Hadoop library to the project. To do this right-click on the Libraries on the WordCount project folder at the left side of the IDE, then pick Add Library.
  4. In the Add Library window, select Hadoop 0.20.0 as the version of Hadoop that we are going to use. Then click the Add Library button.
  5. The appropriate library now has been added to the project. Next we are going to the WordCount.java editor. Edit this file with this code below:
    import java.io.IOException;
    import java.util.*;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    public class WordCount{
    	public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>;
    		private final static IntWritable one =  new IntWritable(1);
    		private Text word = new Text();
    		public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException{
    				String line = value.toString();
    				StringTokenizer tokenizer = new StringTokenizer(line);
    				while(tokenizer.hasMoreTokens()){
    					word.set(tokenizer.nextToken());
    					output.collect(word, one);
    				}
    		}
    	}
    	public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
    		public void reduce(Text key, Iterator<IntWritable> values,
    			OutputCollector<Text, IntWritable> output, Reporter reporter)
    			throws IOException{
    				int sum = 0;
    				while (values.hasNext()){
    					sum += values.next().get();
    				}
    				output.collect(key, new IntWritable(sum));
    		}
    	}
    	public static void main(String[]args) throws IOException{
    		JobConf conf = new JobConf(WordCount.class);
    		conf.setJobName(&quot;wordcount&quot;);
    		conf.setOutputKeyClass(Text.class);
                    conf.setOutputValueClass(IntWritable.class);
    		conf.setMapperClass(Map.class);
    		conf.setReducerClass(Reduce.class);
    		conf.setInputFormat(TextInputFormat.class);
    		conf.setOutputFormat(TextOutputFormat.class);
    		FileInputFormat.setInputPaths(conf, new Path(args[0]));
    		FileOutputFormat.setOutputPath(conf, new Path(args[1]));
                    try{
                        JobClient.runJob(conf);
                    }catch(IOException e){
                        System.err.println(e.getMessage());
                    }
    	}
    }

    This program will take two arguments, the directory path of the input and the output. In this post, I’ll not explain the details about the code above. Please refer to the Apache Hadoop MapReduce tutorial if you wanna know about it

  6. After we sure that there is no error or typo, let’s build the program. To do this, right-click the WordCount project in the left side and pick Build. This step will create the JAR file of the program.
  7. Next, we will prepare the input for this program. We will create a folder and two text files inside the folder.For example, if you are creating input folder at your home directory, then the path will be /home/username/input. Inside it create two text files, let’s name it file01 and file02.On the first file type the sentence (without the quotes): “Hello world Bye world

    And in the second sentence type (without the quotes): “Hello Hadoop Bye Hadoop

    Actually, you can type anything you want. The two sentences are just examples. Save the files when you’re done

  8. We are done in this step. Let’s go to the final step.

Running the MapReduce job

Okay. Now we are going to run the MapReduce job locally in Netbeans. This is how it’s done.

  1. On the left side of the IDE, click the Services tab. Right-click on the Hadoop Jobs and pick New Job.
  2. Give WordCount as Job Name and select the Hadoop Job from pre-existing JAR file type. Click Next when you’re done.
  3. Then, browse the JAR file we already created in the previous step. Click browse and go to your Netbeans WordCount Project folder. The JAR file is located in the dist folder. If you’re using Netbeans default settings, then the JAR file will be located in /home/username/NetbeansProjects/WordCount/dist. Click Next when you’re done.
  4. In the step Set Job Defaults (Step 5 of 5), choose In-Process Thread (0.20.0) as the default cluster. Then, in the Default Arguments type the arguments needed by the program. In this case, the input and output directory path. Type the input folder that we created earlier and the output folder:/home/username/input /home/username/outputFor your information, we don’t need to create the output folder first. The program will create the folder for you. Click Finish when you’re done.
  5. Now, we will finally run the MapReduce job. To do this right-click the WordCount under the Hadoop Jobs list and pick Run Job…
  6. In the Execute Hadoop Job window, give WordCount as the Job Name and click Run.
  7. If your job executes successfully, there will be an output directory and inside it you’ll find a file. Inside the file you’ll find something like this:
    Bye	2
    Hadoop	2
    Hello	2
    World	2

Now we’re done. If you have a question, feel free to ask me. But for your information, I’m still learning about this too. Let’s study about it together. Have a nice try and see you on the next post.

Next post: Hadoop on single-node cluster

87 thoughts on “Programming Hadoop in Netbeans

  1. is english mandatory here? >_>

    well, emm, i followed the step and succeed. while i'm still a noob at these, i think i can still understand a lil' bit, maybe because i took paralel programming subject back then :D. after skimmed the wiki article, i assume the concept basically similar with the mpi in paralel programming

    btw, this is a great tutorial and very well written. keep up the good work!

  2. hello..

    i want ask you about Hadoop..

    Can i develop Hadoop MapReduce job using Netbeans 6.8 in Windows Xp?

    Thanks before..I hope you ask my question..

    1. hello there..

      Sorry, I just checked my site again.. ^^

      I never tried it in Windows XP, but some said that it worked there..

      but if u want to deploy a Hadoop MapReduce jobs, I think u only can do it on Linux..

      CMIIW..

  3. I've already followed the stops of your article about writing mapreduce program with hadoop. Unfortunately mine is unsuccessful and folder output doesn't appear. I'm so confuse. Would you help me.

    1. Really?

      Did the output console produce error message?

      One thing to remember, you shouldn't create the output folder first.. the code will do it for you..

      1. I've done with my quick start on hadoop tutorial and succesfully deploy may cluster with 3 PCs. By the way, can hadoop accsesses database file like sqlite and write on it? I want to make an indexing documents with hadoop.

      2. @Fikri:

        for storing indexed documents you can use HBase, a non relational database provided with Hadoop.

        For further exploration about indexing documents for searching purposes, you can check Lucene.

  4. Hi,

    I was trying this program on Windows XP and I got this problem:

    Cannot run program "chmod": CreateProcess error=2

    It seems it has a problem in "chmod" command. Does anybody know how to solve this problem?

    Thanks,

    Reza

    1. Hmm…

      AFAIK, chmod is a command for changing file/directory access permissions in Linux/UNIX. Maybe the error exists because there is no chmod command in Windows..

      It makes me wonder, where the command came from?

  5. I've tried your tutorial above and I did it exactly the same like yours but there is still error. The error:

    java.util.ConcurrentModificationException

    at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)

    at java.util.AbstractList$Itr.next(AbstractList.java:343)

    at com.karmasphere.studio.hadoop.executor.HadoopExecutor.getClassLoaderRoots(HadoopExecutor.java:412)

    at com.karmasphere.studio.hadoop.executor.HadoopExecutor.getClassLoader(HadoopExecutor.java:490)

    at com.karmasphere.studio.hadoop.executor.HadoopExecutor.getMainClass(HadoopExecutor.java:520)

    at com.karmasphere.studio.hadoop.executor.HadoopExecutor.getMainMethod(HadoopExecutor.java:642)

    at com.karmasphere.studio.hadoop.executor.HadoopExecutor.isInvokable(HadoopExecutor.java:771)

    at com.karmasphere.studio.hadoop.executor.HadoopExecutorConfigPanel.initJobPropertyEditor(HadoopExecutorConfigPanel.java:180)

    at com.karmasphere.studio.hadoop.executor.HadoopExecutorConfigPanel.(HadoopExecutorConfigPanel.java:129)

    at com.karmasphere.studio.hadoop.job.RunJobAction.runJob(RunJobAction.java:37)

    at com.karmasphere.studio.hadoop.job.RunJobAction.runJob(RunJobAction.java:46)

    at com.karmasphere.studio.hadoop.job.RunJobAction.performAction(RunJobAction.java:67)

    at org.openide.util.actions.NodeAction$DelegateAction$1.run(NodeAction.java:589)

    at org.netbeans.modules.openide.util.ActionsBridge.implPerformAction(ActionsBridge.java:83)

    at org.netbeans.modules.openide.util.ActionsBridge.doPerformAction(ActionsBridge.java:64)

    at org.openide.util.actions.NodeAction$DelegateAction.actionPerformed(NodeAction.java:585)

    at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995)

    at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318)

    at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387)

    at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)

    at javax.swing.AbstractButton.doClick(AbstractButton.java:357)

    at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1223)

    at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1264)

    at java.awt.Component.processMouseEvent(Component.java:6263)

    at javax.swing.JComponent.processMouseEvent(JComponent.java:3267)

    at java.awt.Component.processEvent(Component.java:6028)

    at java.awt.Container.processEvent(Container.java:2041)

    at java.awt.Component.dispatchEventImpl(Component.java:4630)

    at java.awt.Container.dispatchEventImpl(Container.java:2099)

    at java.awt.Component.dispatchEvent(Component.java:4460)

    at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4574)

    at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4238)

    at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4168)

    at java.awt.Container.dispatchEventImpl(Container.java:2085)

    at java.awt.Window.dispatchEventImpl(Window.java:2478)

    at java.awt.Component.dispatchEvent(Component.java:4460)

    [catch] at java.awt.EventQueue.dispatchEvent(EventQueue.java:599)

    at org.netbeans.core.TimableEventQueue.dispatchEvent(TimableEventQueue.java:125)

    at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:269)

    at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)

    at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:174)

    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169)

    at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161)

    at java.awt.EventDispatchThread.run(EventDispatchThread.java:122)

    1. From the exception identification (ConcurrentModificationException), it looks like the Hadoop executor try to modify and AbstractList.

      What is your development environment? Did you do something else?

      1. I used Netbeans IDE and the newest update Karmasphere that is used for hadoop 0.20.2. I don't know what is wrong with my app. I'm sure that I've followed your instruction above line per line.

        Can I have your YM or GTalk id?

      2. I haven't tried the latest Karmasphere..I'll try it and report it here asap..

        I've sent you my YM id, check your email. Thank you.. :D

  6. hiiiiiiiiiii…

    hey during the running mapreduce jo ….after step 5 which is as follows:"" we will finally run the MapReduce job. To do this right-click the WordCount under the Hadoop Jobs list and pick Run Job…""

    as i excuted this step a window appeared named "Hadoop deployment wordcount"

    and d content is as given below…i ws nt able to go to step 6..nd no output folder create:::

    Using cluster In-Process Thread (0.20.2)

    Using filesystem Local Filesystem /

    Preparing to execute job 2010-06-07/wordcoun-152434_J5

    Creating class loader…Building configuration resources…OK.

    Creating composite jar file…

    Writing to /tmp/hadoop-job-2357655027006475012.jar…

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/commons-cli-2.0-SNAPSHOT.jar

    Skipping /home/shubham/.netbeans/6.8/modules/ext/commons-codec-1.3.jar: it looks like a stock Hadoop jar.

    Skipping /home/shubham/.netbeans/6.8/modules/ext/commons-httpclient-3.1.jar: it looks like a stock Hadoop jar.

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/commons-logging-1.1.1.jar

    Skipping /home/shubham/.netbeans/6.8/modules/ext/commons-net-1.4.1.jar: it looks like a stock Hadoop jar.

    Skipping /home/shubham/.netbeans/6.8/modules/ext/oro-2.0.8.jar: it looks like a stock Hadoop jar.

    Skipping /home/shubham/.netbeans/6.8/modules/ext/log4j-1.2.15.jar: it looks like a stock Hadoop jar.

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/jets3t-0.7.1.jar

    Skipping /home/shubham/.netbeans/6.8/modules/ext/xmlenc-0.52.jar: it looks like a stock Hadoop jar.

    Skipping /home/shubham/.netbeans/6.8/modules/ext/hadoop-0.20.2-core.jar: it looks like a stock Hadoop jar.

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/hadoop-0.20.2-streaming.jar

    Adding standard JAR library lib/hadoop-0.20-karmasphere-extras.jar

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/karmasphere-client.jar

    Aggregating configuration file META-INF/vfs-providers.xml

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/karmasphere-client-amazon.jar

    Aggregating configuration file META-INF/vfs-providers.xml

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/jsch-0.1.42-patched.jar

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/commons-beanutils-core-1.8.2.jar

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/commons-lang-2.4.jar

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/commons-io-1.4.jar

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/commons-vfs-1.0.jar

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/collections-generic-4.01.jar

    Merging JAR library /home/shubham/.netbeans/6.8/modules/ext/jsr305.jar

    Merging Hadoop JAR library /home/shubham/NetBeansProjects/WordCoun/dist/WordCoun.jar

    Adding aggregated configuration file META-INF/vfs-providers.xml

    Writing done. File size is 3M

    Adding CompositeJar file /tmp/hadoop-job-2357655027006475012.jar to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/commons-cli-2.0-SNAPSHOT.jar!/[MERGE] to roots.

    Adding ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/commons-codec-1.3.jar!/ to roots.

    Adding ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/commons-httpclient-3.1.jar!/ to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/commons-logging-1.1.1.jar!/[MERGE] to roots.

    Adding ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/commons-net-1.4.1.jar!/ to roots.

    Adding ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/oro-2.0.8.jar!/ to roots.

    Adding ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/log4j-1.2.15.jar!/ to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/jets3t-0.7.1.jar!/[MERGE] to roots.

    Adding ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/xmlenc-0.52.jar!/ to roots.

    Adding ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/hadoop-0.20.2-core.jar!/ to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/hadoop-0.20.2-streaming.jar!/[MERGE] to roots.

    Adding ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/hadoop-0.20-karmasphere-extras.jar!/ to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/karmasphere-client.jar!/[MERGE] to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/karmasphere-client-amazon.jar!/[MERGE] to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/jsch-0.1.42-patched.jar!/[MERGE] to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/commons-beanutils-core-1.8.2.jar!/[MERGE] to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/commons-lang-2.4.jar!/[MERGE] to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/commons-io-1.4.jar!/[MERGE] to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/commons-vfs-1.0.jar!/[MERGE] to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/collections-generic-4.01.jar!/[MERGE] to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/.netbeans/6.8/modules/ext/jsr305.jar!/[MERGE] to roots.

    Not adding MERGE ClassPath entry jar:file:/home/shubham/NetBeansProjects/WordCoun/dist/WordCoun.jar!/[INSPECT_JAVA, INSPECT_HADOOP, MERGE] to roots.

    plz reply me comment either on my email id or on same website….

    its very important 4 me..plz do reply as soon as possible…..

    thnx a lot..this tutorial hrlped me a lot for installing and connecting d haddop nd netbeans….thnx

    do reply

    1. Hmm..

      I can suggest you some quick solution, I dunno if it will work or not..

      first, check your output folder permission, is it permissible to create a new directory?

      second, from the stack trace of your program, I see that you name the project as WordCoun, not WordCount.. I dunno if this is matter, but try change it.

      If the problem still persist, tell me more about it..

  7. Hello.. I'm testing this codes but I'm facing some error also.

    You got any idea whats going on? Or we can discuss this using IM?

    My specification is:

    I'm running it on ubuntu using vmware (I'm using xp)

    I'm using hadoop 0.20.2

    Thanks for your help in advance first.

    10/06/10 21:03:26 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

    10/06/10 21:03:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

    10/06/10 21:03:29 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

    10/06/10 21:03:30 INFO mapred.FileInputFormat: Total input paths to process : 2

    10/06/10 21:03:31 INFO mapred.FileInputFormat: Total input paths to process : 2

    10/06/10 21:03:31 INFO mapred.JobClient: Running job: job_local_0001

    10/06/10 21:03:31 INFO mapred.MapTask: numReduceTasks: 1

    10/06/10 21:03:31 INFO mapred.MapTask: io.sort.mb = 100

    10/06/10 21:03:32 WARN mapred.LocalJobRunner: job_local_0001

    java.lang.OutOfMemoryError: Java heap space

    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:781)

    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

    10/06/10 21:03:32 INFO mapred.JobClient: map 0% reduce 0%

    10/06/10 21:03:32 INFO mapred.JobClient: Job complete: job_local_0001

    10/06/10 21:03:32 INFO mapred.JobClient: Counters: 0

    Job failed!

    BUILD SUCCESSFUL (total time: 9 seconds)

    1. I'm sorry..

      Like I said earlier, I haven't tried it on the latest version of Karmasphere and Hadoop. Maybe there are some changes in the latest version that I don't know about..

      I'll try this on the latest version, and I'll report you again as soon as possible.

      Cheers..

    2. Hi,

      I am also getting the same issue and poking around to get a solution.I changed all memory configuration for netbean without any result.

      Could anybody please help.

      Thanks in advance.

      1. Hi Papu (and Gary)..

        I’m sorry, I just got the chance to try the latest version of Karmasphere.. And you know what? I got the same problems as you two..

        Then I saw Papu sent a message in Karmasphere mailing list and I tried the response given by the support..
        The Karmasphere support asked us to upgrade our JDK, and after I upgrade it, everything works well..

        You should try it.. Good luck.. :D

  8. thank you for this tutorial. it worked perfectly. however, i copied the generated WordCount.jar to my hadoop cluster but i could not run it over there. please can you give an insight on how to run a mapreduce program (ie a jar file ) in hadoop. i have only been running the examples from hadoop banchmark but i do not know how to import my program and run in my cluster. sorry i'm still a newbie in this technology. please i will appreciate it more if you can send a response to my email. thank you.

    Joseph

  9. here is an extract of the error i was getting.

    As you can from the jps command, my hadoop is running ok.

    hadoop@ubuntu:/usr/local/hadoop/hadoop$ jps

    4032 TaskTracker

    4195 Jps

    3509 NameNode

    3865 JobTracker

    3654 DataNode

    3807 SecondaryNameNode

    then i tried to run the wordcount jar file, after copying it to another location in my local system. ie /home/hadoop/the jar file

    however, when i try to run it, this is the error i got

    hadoop@ubuntu:/usr/local/hadoop/hadoop$ bin/hadoop jar /home/hadoop/WordCount.jar WordCount /home/hadoop/input /home/hadoop/output

    10/07/18 11:43:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

    Input path does not exist: hdfs://localhost:54310/user/hadoop/WordCount

    please can anyone assist me.

    1. From the error message, I assume that the program can't find your input path..

      They are looking for /home/hadoop/input that haven't existed yet. Maybe you should make sure that the direktori is available in HDFS.

      CMIIW

  10. it lokks to be very helpful, i was searching for this only. very valuable tutorial. I will try & at the eariest i will share my experience.

    I have a query……

    can I simulate a Hadoop cluster on 1 machine(laptop/desktop) or have to have a cluster of machines?

    if cluster is required anyway, can u direct for the tutorial to build the cluster to run hadoop mapreduce projects on it.

    I will be thankfull a lot.

  11. Good Morning sir

    i am doing final year mtech. want to do project in mapreduce using hadoop tool. want to clarify some doubts with you sir.

    i tried what you said in this tutorial its excellent working. i was tried upto running of wordcount program it was working. but i want to know how to do a small application using hadoop tool that means using jsp or using swing concepts to develop user interface for an application to developing project.

    if you have idea pls give me some guidelines sir i am in need sir pls. i hope will get reply from you sir.

    1. I never tried it before, but one thing that came to mind is that you can execute shell command on your Java class.

      I googled and found some readings that I hope can help you:

      Stackoverflow

      DZone Snippets

      Oracle Sun Forum

      and many more…

      If you succeed, it would be nice for you to write an article about it on your blog and tell me about it..

  12. Hi,really its a nice tutorial..
    I wish to develop a Multitenant Web application framework for data analysis.
    To store any manage the data of various tenants am in need of a sql like row oriented database(eg:Hive)support and the job execution follows the basic map reduce processing.

    I’ve heard that karmasphere studio offers developing hadoop jobs along with hive(database)support.I’ve tried the latest version of that.But the current version of Netbeans IDE 6.8 or 6.9.1 has no option for developing a web application and also not able to download plugin for karmasphere studio..(getting Error:503 service unavailable).
    Am i going in the right direction?
    Will Netbeans supports good for developing a Multitenant web application ?
    Could you please tell me the Netbeans version for linux which provide support for my requirement.
    Please give me your gmail or other mail id to discuss more about the project.

    Please give your valuable suggestions regarding this issue.
    Thanx in advance.

    1. I don't know about Hive yet..

      Maybe you can check on MongoDB. It's a document-oriented database that has support on MapReduce processing.

      Maybe Java EE has support in Multitenant application. And if it does, then you can use Netbeans..

      I gave you the detailed explanation on your email.. check it out..

  13. hello sir,

    m tried your code with hadoop0.20.2 and netbeans6.8

    configured as per ur post bt getting following error…

    run:

    10/09/20 14:35:12 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

    10/09/20 14:35:12 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

    10/09/20 14:35:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

    10/09/20 14:35:13 INFO mapred.FileInputFormat: Total input paths to process : 2

    10/09/20 14:35:13 INFO mapred.JobClient: Running job: job_local_0001

    10/09/20 14:35:13 INFO mapred.FileInputFormat: Total input paths to process : 2

    10/09/20 14:35:13 INFO mapred.MapTask: numReduceTasks: 1

    10/09/20 14:35:13 INFO mapred.MapTask: io.sort.mb = 100

    10/09/20 14:35:13 WARN mapred.LocalJobRunner: job_local_0001

    java.lang.OutOfMemoryError: Java heap space

    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:781)

    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:350)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

    10/09/20 14:35:14 INFO mapred.JobClient: map 0% reduce 0%

    10/09/20 14:35:14 INFO mapred.JobClient: Job complete: job_local_0001

    10/09/20 14:35:14 INFO mapred.JobClient: Counters: 0

    Job failed!

    BUILD SUCCESSFUL (total time: 1 second)

    plz gve me some help

    1. Dear Pranay,

      If you got java.lang.OutOfMemoryError, my first suggestion is for you to update your JDK to the latest version..

      I hope that helps.. :D

  14. Hello sir,

    i am currently using jdk1.6. bt as per ur suggestion I just changed my netbeans version and the problem gets solved… thanks a lot 4 d help…

  15. Hi Arif,

    Right now am trying to setup a multinode hadoop cluster using Centos 5.4 with 3 nodes. While trying to execute the command hadoop namenode -format

    am getting the following error..

    [hadoop@localhost hadoop-0.20.2]$ bin/hadoop namenode -format

    Exception in thread "main" java.lang.NoClassDefFoundError: Djava/net/preferIPv4Stack=true

    Caused by: java.lang.ClassNotFoundException: Djava.net.preferIPv4Stack=true

    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)

    at java.security.AccessController.doPrivileged(Native Method)

    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)

    at java.lang.ClassLoader.loadClass(ClassLoader.java:252)

    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)

    Could not find the main class: Djava.net.preferIPv4Stack=true. Program will exit.

    [hadoop@localhost hadoop-0.20.2]$ bin/hadoop namenode -format

    10/10/02 17:26:39 INFO namenode.NameNode: STARTUP_MSG:

    /************************************************************

    STARTUP_MSG: Starting NameNode

    STARTUP_MSG: host = localhost.localdomain/127.0.0.1

    STARTUP_MSG: args = [-format]

    STARTUP_MSG: version = 0.20.2

    STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/br… -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

    ************************************************************/

    [Fatal Error] hdfs-site.xml:18:4: The element type "name" must be terminated by the matching end-tag "".

    10/10/02 17:26:39 FATAL conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: The element type "name" must be terminated by the matching end-tag "".

    10/10/02 17:26:39 ERROR namenode.NameNode: java.lang.RuntimeException: org.xml.sax.SAXParseException: The element type "name" must be terminated by the matching end-tag "".

    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1168)

    at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1030)

    at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)

    at org.apache.hadoop.conf.Configuration.set(Configuration.java:405)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.setStartupOption(NameNode.java:927)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:944)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

    Caused by: org.xml.sax.SAXParseException: The element type "name" must be terminated by the matching end-tag "".

    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:239)

    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:283)

    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:180)

    at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079)

    … 6 more

    10/10/02 17:26:39 INFO namenode.NameNode: SHUTDOWN_MSG:

    /************************************************************

    SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1

    ************************************************************/

    [hadoop@localhost hadoop-0.20.2]$ bin/hadoop namenode -format

    10/10/02 17:28:43 INFO namenode.NameNode: STARTUP_MSG:

    /************************************************************

    STARTUP_MSG: Starting NameNode

    STARTUP_MSG: host = localhost.localdomain/127.0.0.1

    STARTUP_MSG: args = [-format]

    STARTUP_MSG: version = 0.20.2

    STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/br… -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

    ************************************************************/

    10/10/02 17:28:44 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop_user

    10/10/02 17:28:44 INFO namenode.FSNamesystem: supergroup=supergroup

    10/10/02 17:28:44 INFO namenode.FSNamesystem: isPermissionEnabled=true

    10/10/02 17:28:44 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /hadoop/hdfs/name/current

    at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:295)

    at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1086)

    at org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1110)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:856)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:948)

    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

    10/10/02 17:28:44 INFO namenode.NameNode: SHUTDOWN_MSG:

    /************************************************************

    SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1

    ************************************************************/

    [hadoop@localhost hadoop-0.20.2]$ ls

    I am not able to start the cluster too..

    What it meant by PATH_TO_JDK_INSTALLATION ?? i have set the environmental variable as "usr/bin/java"..

    Also in one of the nodes am not able to update the java version from 1.4 to 1.6 even after successful installation…

    i've tried to uninstall java 1.4 version but i could not succeed..

    I too have followed the tutorial bu Micheal Noll for above setup. .http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)

    Kindly refer this too.. [http://www.mazsoft.com/blog/post/2009/11/19/setting-up-hadoophive-cluster-on-Centos-5.aspx ]

    Pls help me..

    Thanx in advance..

    Regards,

    Sangita..

  16. Hello

    We tried wordcount example on hadoop in windows using karmasphere.

    When executing it asks for Unix OS

    Invocation failed.

    com.karmasphere.studio.hadoop.executor.HadoopExecutorException: Hadoop's JobClient only works on Unix-based operating systems. If you need to deploy directly from a Windows system, you must rewrite the job to use the Karmasphere Hadoop client library. Otherwise, you can export the job, copy it to a Unix system, and run it from there.

    Where can I get the procedure to install using karrmasphere Hadoop client library

    with regards

    Dr G sudha Sadasivam

  17. I run the hadoop on windows xp and got the error below:

    Cannot run program "chmod": CreateProcess error=2

    can you help me to solve the problem?

    thanks in advance!

  18. Hello

    I am running Matrix Multiplication code and getting following error. Plz help me

    :

    11/02/23 19:02:45 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

    11/02/23 19:02:46 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

    11/02/23 19:02:47 INFO input.FileInputFormat: Total input paths to process : 4

    11/02/23 19:02:47 INFO mapred.JobClient: Running job: job_local_0001

    11/02/23 19:02:47 INFO input.FileInputFormat: Total input paths to process : 4

    11/02/23 19:02:47 INFO mapred.MapTask: io.sort.mb = 100

    11/02/23 19:02:48 INFO mapred.MapTask: data buffer = 79691776/99614720

    11/02/23 19:02:48 INFO mapred.MapTask: record buffer = 262144/327680

    11/02/23 19:02:48 WARN mapred.LocalJobRunner: job_local_0001

    java.io.EOFException

    at java.io.DataInputStream.readFully(DataInputStream.java:197)

    at java.io.DataInputStream.readFully(DataInputStream.java:169)

    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450)

    at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1428)

    at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)

    at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)

    at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)

    at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)

    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:620)

    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)

    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

    11/02/23 19:02:48 INFO mapred.JobClient: map 0% reduce 0%

    11/02/23 19:02:48 INFO mapred.JobClient: Job complete: job_local_0001

    11/02/23 19:02:48 INFO mapred.JobClient: Counters: 0

    Exception in thread "main" java.lang.Exception: Job 1 failed

    at matrixmultiply.job1(matrixmultiply.java:620)

    at matrixmultiply.runJob(matrixmultiply.java:743)

    at matrixmultiply.main(matrixmultiply.java:803)

    Java Result: 1

    BUILD SUCCESSFUL (total time: 5 seconds)

    My developing enviornent is Hadoop 0.20.2 and Netbeans 6.7.1

    1. It's said that you got an EOFException that happened because an end of file or end of stream has been reached unexpectedly during input.

      I think you should check again your Mapper input format or file location.. I hope that could help.. :D

      1. Hi,

        Regarding Matrix multiplication program i am using SequenceTextFileInput format that means my input Directory should be in HDFS. How can I use it from netbeans to give it as input ?

        I want multiplication for around 20k * 20K matrix and should be fast. Any suggestion for it.

        Thanx for your mail

      2. Yes you can use Karmasphere to do that.

        First, you have to configure your cluster first, then register it to Karmasphere. Then you can create a Hadoop job and run it from your Netbeans IDE..

        I have a plan to write about that, just be patient.. :D

      3. I want to ask one more thing is der any way so that we can convert our input(text file) format to SequenceFileInputFormat and vice versa so that our application run more faster.

  19. Hello

    I have the same OutOfMemoryError problem, and I have update my java to the latest version, but it doesn't work.

    My Netbeans' version is 6.9.1,jdk's verson is 1.6.0_24, hadoop's version is 0.20.

    I need help~.

    thx!

  20. Hi, you said that you are using Ubuntu Karmic Koala, is it a good distribution? Because i want to install Netbeans and linux on my new computer but not sure which one will work good with Netbeans. Thanks!

    1. Ubuntu is a good distribution.. Really good indeed..
      And it works well with Netbeans.. You can try the latest distribution of Ubuntu..

  21. Hadoop's JobClient only works on Unix-based operating systems. If you need to deploy directly from a Windows system, you must rewrite the job to use the Karmasphere Hadoop client library. Otherwise, you can export the job, copy it to a Unix system, and run it from there.RizzutosWideShoes.com

  22. hi, I'm using netbeans 7.0 and indicated plugin installed but when I go to copy the code does not recognize the imports, any ideas?.

    1. To "public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>;" you have ";" but should be "{" ;)

    2. have you added the Hadoop Karmasphere library on your project's Libraries?
      Check the 3rd step on "typing some codes" part.. :D

  23. I've tried your tutorial above and I did it exactly the same like yours but there is still error. The error:

    java.util.ConcurrentModificationException

    at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)

    at java.util.AbstractList$Itr.next(AbstractList.java:343)

  24. Hi I tired doing it but this appeared:

    Cannot run program "chmod": CreateProcess error=2

    I'm just baffled as to why Chmod did not work..? Does anybody know how to solve this problem?

  25. I have followed this article and get nothing but –>
    Using cluster In-Process Thread (0.20.2)

    in the end, with no output at all. Need some help

  26. Waw.. Good Blog Mas.. You have got a lot of readers from all over the world Mas.. Lift the cap and lift the thumb. (ar-wdh.blogspot.com)

  27. I get “connection closed” error while trying to browse hadoop file system. The port used is 54311

  28. I mean, everything works fine while using the local file system. What I want to do, is to be able to run with hdfs. I have one ubuntu based Hadoop master and 1 ubuntu based Hadoop slave.The ports assigned for namenode and datanode are 54311 and 54310 respectively. Pls advise on this.

    1. did you browse using Karmasphere? or using console?
      are you sure you already started the hadoop daemon?

  29. Pingback: 鋁圈
  30. Pingback: Pc urgence
  31. Pingback: nexus
  32. hi,Can i execute hadoop on windowsXP with NetBean??
    How Can i do it?(execute on winxp with netbean)
    i installed hadoop plugin in my netbean.

  33. Hello,

    I tried o install Karmashpher Plugin the way you said but gives the error that Server response is 403 and check your proxy settings etc i went to the site the http://www.hadoopstudio.org it takes me to “www.karmasphere.com/404.html” i am not able to install this plugin kindly if you tell me what to do about this i will be very thankful to you.

  34. I AM NOT getting require plugin “Karmasphere Studio for Hadoop” in the netbook after going through the downloading process. Is there any other way to install it? Have update URL is changed?

    Please suggest me..

What's in your mind?