« Archives in November, 2009

DataTable jQuery Plugin – JavaScript inside a table cell

If the data source is AJAX and the cell has JavaScript, the node is created and assigned directly via nTd.innerHTML=data so any JavaScript code inside the cell won’t get executed. To correct than you need the custom fnRender for that column.

Hadoop vs. MySQL

I just play with Hadoop, HBase, Hive, Pig via Cloudera’s guide (thanks to Cloudera for bringing these packages to CentOS) for a couple days. Cloudera is going in the right direction, targeting the enterprises. Hadoop is definitely on the watch list as it matures. But right now, it’s very technical and would not be suitable for the general public. I’m also disappointed on its performance for a smaller testing cluster (which I understand is unfair for what it’s designed for). For its to shine, you would need both, the problem has to be big enough and the server farms has to be big enough. However, I think there are many companies that initially test Hadoop on a small cluster before actually investing more time and money into it. It’s the first impression that makes a lasting impact. As it matures, I expect there will be overhead-reduction optimizations done on the small/low-end clusters.

Setting up MySQL is easy, scaling it is not so easy but there are many related software and technology to help you. But don’t think you can just switch to Hadoop/HBase/Hive in a day. The selling point is there (no-limit scaling on commodity hardware at the core design) but there are many land mines that you could walk on if decisions are not evaluated carefully. Right now, I see Hadoop as one of the last resorts because you’re running into a wall, exhausting RDBMS options and its related software/technology that help you scale, like memcache, message queues, load balancing, etc. You should not choose Hadoop just because you started a company and might get big in a couple years. Of course there are exceptions when you know your problem domain is only solvable in a distributed system. The popularity of Hadoop could change (or not) if the priority on Hadoop is to dominate both markets or just focus on the large farms.

You face complexity when dealing with Hadoop/Hbase/Hive/HDFS (like setting up, breaking things down into tasks, and setting up batch operations). For many many applications, MySQL (or RDBMS) ain’t going anywhere. I see smart companies use both for different parts of their operations. Unless Hadoop can do real-time, low-latency operations in distributed server farms effortlessly, there is no clear winner now, or ever. Maybe the trend on real-time search (Twitter, FaceBook) might be able to speed this up.

Hive troubleshooting

I am playing with Hadoop and Hive via Cloudera RPMs. The development status is very active, meaning it could be hard to track down the errors or find help with a specific one.

Permission of /tmp in HDFS

FAILED: Unknown exception : org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=mathie, access=WRITE, inode="tmp":hadoop:supergroup:rwxrwxr-x

Solution: You need to turn on full write permissions for /tmp

sudo -u hadoop hadoop fs -chmod 777 /tmp

.hivehistory

[root@r2 tmp]# sudo -u hadoop hive
Hive history file=/tmp/hadoop/hive_job_log_hadoop_200911142019_988931842.txt
java.io.FileNotFoundException: /.hivehistory (Permission denied)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.(FileOutputStream.java:179)
 at java.io.FileOutputStream.(FileOutputStream.java:131)
 at java.io.FileWriter.(FileWriter.java:73)
 at jline.History.setHistoryFile(History.java:45)
 at jline.History.(History.java:37)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:298)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

It means your $HOME folder is empty and it’s trying to create /.hivehistory on the top level, which of course is not possible. Solution: make sure it’s a real user with a $HOME (“echo $HOME” to check) and not via sudo

/etc/hive/conf/hive-site.xml

hive> show tables;
FAILED: Error in metadata: javax.jdo.JDODataStoreException: SQL exception: Add classes to Catalog "", Schema "APP"
NestedThrowables:
java.sql.SQLNonTransientConnectionException: No current connection.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Time taken: 3.142 seconds

Even with embedded mode for metastore, if you run into this problem, look like an issue with Cloudera RPM for Hive that uses ${user.name} not being replaced properly.

Solution: change “${user.name}” to an regular folder and Hive works fine.

haproxy vs. LVS (layer 7 vs layer 4 load balancing)

We just deployed our first haproxy load balancer and still running several LVS-TUN load balancers. Even when advertised as a fast and lightweight, it’s comparing with other layer-7 LB, not with layer-4 LB like LVS.

Load Average / CPU Usage

haproxy still requires much more resource. On the same really basic server (Pentium 3/4 or something really slow), LVS load average is always or near zero, even with many incoming requests. haproxy’s load average is about 0.3 to 0.5

Features

But the good thing about haproxy is that it has more features and flexible in term of configuration. LVS-TUN is best when the ISP allows packets to have LB’s IP (spoofed packets). But if you don’t have that option, haproxy is the next best thing. Assuming you don’t need HTTP header inspection feature of haproxy, which LVS does not have because it’s layer 4.

Bandwidth Utilization

LVS-TUN only takes the incoming portion of the requests so bandwidth requirement would be half of the full process (haproxy and same for LVS-NAT).

SSL

LVS-TUN does it effortlessly because it does not deal with the content of the packets at all. haproxy can deal with SSL with 2 options:

  • via TCP option (haproxy acts as a layer 4 LB). Pros: easy. Cons: you won’t be able to get the client IP, which to some app is a deal breaker.
  • Stunnel runs on the same machine as haproxy to process SSL then forward to haproxy as a standard request. Pros: client IP is passed with the provided patch on haproxy’s website. Cons: could slow down the LB machine if there’re many SSL requests, need to setup SSL when passing between haproxy and the workers for really secure data

Conclusion

Both haproxy and LVS have their own space. Use LVS-TUN when possible for the best performance and scalability. haproxy is best when you need header inspection and LVS-TUN is not possible with the ISP/network.

Monitor LSI MegaRAID under CentOS

Not very user friendly with documentation but I guess at least it runs!

Basic Monitor Script

Sample Output

Checking RAID status on xxx
Controller a0:  MegaRAID SAS 8344ELP
No of Physical disks online : 4
Degraded : 0
Failed Disks : 0

Upgrade Firmware

To determine the current firmware, run “MegaCli -AdpAllInfo -a0”

Product Name    : MegaRAID SAS 8344ELP
Serial No       : P00253390X
FW Package Build: 7.0.1-0064

                    Mfg. Data
                ================
Mfg. Date       : 09/27/06
Rework Date : 00/00/00
Revision No     : 8

                Image Versions In Flash:
                ================
Boot Block Version : R.2.3.15
BIOS Version       : MT33
MPT Version        : MPTFW-01.18.79.00-IT
FW Version         : 1.12.220-0560
WebBIOS Version    : 1.1-33g-e_11-Rel
Ctrl-R Version     : 1.04-019A

Check the LSi website for the current downloads, in this case:

http://www.lsi.com/storage_home/products_home/internal_raid/megaraid_sas/megaraid_sas_8344elp/index.html

- Download the firmware, unzip
- Run "MegaCli -adpfwflash -f SAS1068_FW_Image.rom -a0"

init.d script for gearmand

init.d script for stunnel on CentOS

You might need to modify some settings to suite your installation. I installed from source.

whereis stunnel
(might need to ln -s /usr/local/bin/stunnel /usr/sbin/stunnel)

vi /etc/init.d/stunnel