In software development, version control system (VCS) hold an important role. Especially when the project is collaborated by many programmers. Besides to keep tracks of changes, version control could helps handle task distribution and later project integration from the programmers. Basically, there are two flavors of version control system: centralized and distributed. There are many comparison between these two flavors on the net, one of them explained it well with some illustrations. The key point between these two systems is there are local working copies of the project in distributed VCS, while in centralized VCS, every changes must be updated to the central repository. Continue reading
Universities or companies tend to have a very strict Internet access policy. They usually deployed a proxy server to filter and denied access to some websites that they thought could be dangerous to them. Sometimes they just block access to several social networking sites or certain email providers to restrain their network users for wasting their time to do what they called “unproductive things”. When I looked around to find a way to avoid this restriction, I stumbled into this great tools called Tor. Continue reading
Like I tell you on the last post, in order to create automatic part-of-speech tagging for text document, I need to collect some corpora. In fact, because I wanna do it on distributed system, I need a large corpora. One great source to collect corpora is from web. But extracting plain text from HTML manually is quite cumbersome. So I heard that we can use a crawler to extract text from the web. Then I stumbled into Nutch.
A Little About Nutch
Nutch is an open source search engine, builds on Lucene and Solr. According to Tom White, Nutch basically consists of two parts: crawler and searcher. The crawler fetches pages from the web and creates an inverted index from it. The searcher accepts user’s queries to the fetched pages. Nutch can run on a single computer, but also can works great on multinode cluster. Nutch use Hadoop MapReduce in order to work well on distributed environment.
Simple Crawling with Nutch
Let’s get to the point. The objective that I defined here is to make corpora from web pages. In order to achieve that, I’m just gonna crawl some web pages and extract its text. So I’ll not writing about searching for now, but I consider to write it on the other post. Okay, this is my environment when I do this experiment:
- Ubuntu 10.10 Maverick Meerkat
- Java 6 OpenJDK
- Nutch version 1.0, you can download here.
After you’re ready, let’s get started, shall we? Continue reading
About a year ago, I created a blog aggregator, or sometimes also called Planet. This planet display all the blog posts from my registered friends. At first, I did it alone. I maintained and designed it by myself. And then, my friend Andreas wanna help me maintaining the site. So I gave him a role as administrator. Some days ago, he sent me this email:
Do you have opened the website in IE6? The layout and the design looks screwed :(
IE6 or Internet Explorer version 6 is a browser shipped with Windows XP. It was released in 2001. It has better CSS support than the previous version, at that time. The problem with the browser is it’s lacking support on the web standard. If you’re a web designer, you must know the designing problems in IE6 . This browser has bad reputation among web designers. Continue reading
Two months ago, I wrote a simple tutorial on how to create a Hadoop MapReduce program using Netbeans. Not a slightest clue in my head that this post will change my life. Okay, I’m exaggerating. I mean the post change the history of this blog.
The first day after I wrote the post, nothing special happened. But the day after, I was shocked when I checked this blog stats. This blog usually have about 15-25 visitors a day. So I was amazed when I saw 90 visitors that day. I started to investigate which post has the biggest contribution in delivering traffics. And I found out that it was the Hadoop in Netbeans post. I noticed that all of the traffic came from one site called DZone. DZone? I never heard that site before. When I was investigating the site, I found that it’s a cool bookmarking site for developer around the globe. And someone, later I known as mitchp, just share my post into this site.
The magic continued the next day. My traffic keep increasing. Then later in that day, I got an email from my hosting provider:
The domain arifn.web.id has reached 80% of its bandwidth limit (807.50/1000.00 Megs).
Well, my bandwidth limit was just 1 GB a month. So I double checked the hosting package and found that my bandwidth limit should be 2 GB. I contacted my hosting provider to make sure that they didn’t make any mistakes. They said that my bandwidth should be 2 GB and they will resolve it a.s.a.p. Okay, so I started installing WP Super Cache to prevent my site from being down because of the high traffic and low bandwidth limit.
The magic ended. The highest peak was on the third day, near 200 visitors. After that, the traffic declining and found its equilibrium state. But, this state is higher than my average traffic before. My average traffic now is about 20-35 visitors a day. Not bad, huh?
Unfortunately, a completely different story is the mid and long-term impact. By this I mean the number of people that discovered my portal thanks to the link and that has become a frequent visitor of the site since then. This is very difficult to assess (there is no way to know if a new subscriber originally discover your site thanks to the DZone link or it is just a temporal coincidence that he/she joined the site around those dates) but if we look at the increase in the number of subscribers to the RSS portal feeds , my twitter account or the daily visits to the site, my estimation is that only a 2-3% of the original DZone visitors has converted into new portal followers.
I second that. Maybe it’s just a sweet temporal coincidence if my traffic growth above the average. But one thing that I can learn from this experience is that if you want to have a high amount of traffic, you should write a good post regularly. And I hope I can do that.
Do you have the same experience?