NoSQL: the End of RDBMS?

05/05/2010

What? NoSQL? Yeah, you read it correctly. NoSQL. I forgot when and where I heard about this for the first time. But I noticed about this data store technology again when I was attending the second Bancakan 2.0 meet up in last March. When I listened to the speaker, lynxluna, I remember about HBase, a scalable distributed database that becomes part of Apache Hadoop project. For your own sake, Apache Hadoop is just one implementation of MapReduce framework.

What is NoSQL?

So, what the hell is NoSQL? Here is the definition of NoSQL in Wikipedia:

NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally. Academics and papers typically refer to these databases as structured storage.

Continue reading

Yet Another Introduction to MapReduce (part 2)

13/03/2010

I’m sorry for the long delay from the first part. I’ve been pretty busy lately. On this part, I write about the idea of MapReduce, how is it work, and how it distributes the data and process. This article is heavily referenced from MapReduce paper by Google. I write it again to deepen my knowledge about the concept. Enjoy!

What is MapReduce?

According to Wikipedia, MapReduce is a software framework patented by Google to support distributed computing on large data sets on clusters of computers. This framework is presented by Jeffery Dean and Sanjay Ghemawat in OSDI’04: Sixth Symposium on Operating System Design and Implementation on December 2004. The main idea is to utilize functional programming techniques, to obtain processing simplification in distributed environment.

MapReduce processing data using list concept that usually used in functional programming. The process consists of two function, map and reduce function. Each function take list of input elements and produce list of output. Map function take inputs and produce intermediate key-value pairs. These pairs then sent to the reduce function. The reduce function take these intermediate key-value pairs as a input. Then, for the same intermediate key, the function merges together the values to produce output. According to the paper, for every reduce invocation typically produces zero or one output value. Continue reading

Yet Another Introduction to MapReduce (part 1)

03/02/2010

There are so many article outside about what is MapReduce, the basic concepts behind it, how it works, and many other things. Even that, I still wanna write a little introduction to MapReduce. It’s mandatory, at least for me, to write about “something” in order to understand the “something”. I challenge my understanding about MapReduce in this post. I’ll use some resources available on the clouds like I mentioned earlier. This is just another introduction to MapReduce.

Data, Data, Data

We are living in the clouds era. Internet provide us with such a great resource to help our lives. In the progress, we created a lot of data. Consider a search engine like Google or Bing. They indexed all of sites across the network. If we are talking about sites these days, that’s a big number we are talking about. Netcraft reported that there are more than 200 Millions sites in the world. It means the search engine must process and analysis a lot of data. Continue reading

Research Plan

18/01/2010

Howdy,

When I was in my college, I tried to implement Web Map Service (WMS) and Web Feature Service (WFS) as a foundation for a distributed Geographical Information Systems (or better known as GIS). My academic advisor at that time told me that this idea is not entirely new, but there are still a lot of people didn’t know about it yet. So with this topic as my thesis, he wished that one day people will know about this technology.

The implementation that I made was quite simple actually. But let me tell you the complete story. At first, I was thinking about develop a geographical operation that can be operated via web service in the clouds. After some weeks of analyzing and gathering informations, I found out that this work could be really hard and time consuming. I didn’t have background in geography–I’m a computer science student–and I didn’t have much time before the next graduation. Finally, I just created a spatial data repository and make it accessible across the network using GeoServer,an Open Source implementation of WMS and WFS. I, then, created a simple web application to pull the spatial data and display it to the browser. I also provided a simple data update feature, utilizing one of the feature of WFS. I used OpenLayers to create the application. It’s really simple actually.

computers
In my graduate study, right now, I want to try something entirely different. I want to explore MapReduce, a programming model for processing a large scale of data in a distributed environment. I heard about this model from some mailing lists and websites, surprised that the paper [pdf], the lecture notes and videos are easy to get. So, for the time being, I decided to do some experiments in order to learn something about it.

It’s still a plan in my head actually. I never talked about it to my thesis advisor (because I have none yet). But I can predict some problems that I will be dealing with if I do this research plan. They are:

  • The case. I don’t have any idea about the case that I should solve with this research. My college’s advisor told me about doing something in bioinformatics like genome assembling. I think I will cosider it. But I’m open for an idea.
  • The machine and its network. The lab are always busy with the other graduate student. Fortunately, one friend of mine told me that there is another place that I can use in the campus to do experiments. But I should create a permission letter first. Okay, I’ll do it.

In the mean time, I’ll focus myself to learn about MapReduce. Maybe I’ll post something about it in this blog. If you have a suggestion about what should I do with this programming model, let me know. I’d be really glad to hear it.


Credits:

EDSAC pictures, copyrighted Computer Laboratory, University of Cambridge, licensed under the Creative Commons Attribution 2.0 Generic license.