{"id":252,"date":"2009-11-16T16:01:00","date_gmt":"2009-11-16T16:01:00","guid":{"rendered":"http:\/\/blog.trungson.com\/?p=252"},"modified":"2009-11-16T16:01:00","modified_gmt":"2009-11-16T16:01:00","slug":"hadoop-vs-mysql","status":"publish","type":"post","link":"http:\/\/blog.trungson.com\/?p=252","title":{"rendered":"Hadoop vs. MySQL"},"content":{"rendered":"<p>I just play with Hadoop, HBase, Hive, Pig via Cloudera&#8217;s guide (thanks to Cloudera for bringing these packages to CentOS) for a couple days. Cloudera is going in the right direction, targeting the enterprises. Hadoop is definitely on the watch list as it matures. But right now, it&#8217;s very technical and would not be suitable for the general public. I&#8217;m also disappointed on its performance for a smaller testing cluster (which I understand is unfair for what it&#8217;s designed for). For its to shine, you would need both, the problem has to be big enough and the server farms has to be big enough. However, I think there are many companies that initially test Hadoop on a small cluster before actually investing more time and money into it. It&#8217;s the first impression that makes a lasting impact. As it matures, I expect there will be overhead-reduction optimizations done on the small\/low-end clusters.<\/p>\n<p>Setting up MySQL is easy, scaling it is not so easy but there are many related software and technology to help you. But don&#8217;t think you can just switch to Hadoop\/HBase\/Hive in a day. The selling point is there (no-limit scaling on commodity hardware at the core design) but there are many land mines that you could walk on if decisions are not evaluated carefully. Right now, I see Hadoop as one of the last resorts because you&#8217;re running into a wall, exhausting RDBMS options and its related software\/technology that help you scale, like memcache, message queues, load balancing, etc. You should not choose Hadoop just because you started a company and might get big in a couple years. Of course there are exceptions when you know your problem domain is only solvable in a distributed system. The popularity of Hadoop could change (or not) if the priority on Hadoop is to dominate both markets or just focus on the large farms.<\/p>\n<p>You face complexity when dealing with Hadoop\/Hbase\/Hive\/HDFS (like setting up, breaking things down into tasks, and setting up batch operations). For many many applications, MySQL (or RDBMS) ain&#8217;t going anywhere. I see smart companies use both for different parts of their operations. Unless Hadoop can do real-time, low-latency operations in distributed server farms effortlessly, there is no clear winner now, or ever. Maybe the trend on real-time search (Twitter, FaceBook) might be able to speed this up.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I just play with Hadoop, HBase, Hive, Pig via Cloudera&#8217;s guide (thanks to Cloudera for bringing these packages to CentOS) for a couple days. Cloudera is going in the right direction, targeting the enterprises. Hadoop is definitely on the watch list as it matures. But right now, it&#8217;s very technical and would not be suitable [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[],"_links":{"self":[{"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/posts\/252"}],"collection":[{"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=252"}],"version-history":[{"count":0,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=\/wp\/v2\/posts\/252\/revisions"}],"wp:attachment":[{"href":"http:\/\/blog.trungson.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=252"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=252"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/blog.trungson.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=252"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}