« Posts by admin

Bandwidth cost for EC2/cloud computing

Cloud computing often advertises on the per hour instance cost (like 10 cents per hour). Cheap right? Not necessarily. There are many additional charges (bandwidth, IO, etc. etc.) that can become much much more expensive than the cost of running the instance. Any wise company that want to invest their time in any cloud must crunch the numbers first. SoftLayer at 10 cent/GB is the most reasonable rate so far for quality. Dedicated hosting companies will continue to have their share because they can oversell (to a degree, some aggressively, some conservatively) when pooling together many clients, some uses only 10% of allocated bandwidth, some use 100%. On clouds, it’s on demand. Thus, the best choice is a hybrid approach, use clouds for burst, surge and maintain the core infrastructure on dedicated servers where deals can be found (i.e. good hosting companies that oversell a little and still provide a good quality service). And of course, every medium/large online system should be designed with outage preventions (high availability, redundancy, no single point of failure). This is why many dedicated providers (SoftLayer, ThePlanet, LayeredTech, RackSpace, ServerBeach, etc.) offer their own cloud. Choices are good and the pie is getting bigger.

haproxy and stunnel

This quick reference to install haproxy, stunnel to support SSL forwarding (with IP forwarding patch)

wget http://haproxy.1wt.eu/download/1.3/src/haproxy-1.3.22.tar.gz
tar -zxf haproxy-1.3.22.tar.gz
cd haproxy-1.3.22
make TARGET=linux26
cp haproxy /usr/sbin/haproxy
vi /etc/haproxy.cfg
...
vi /etc/init.d/haproxy
...
chmod +x /etc/init.d/haproxy 

useradd haproxy 
mkdir -p /var/chroot/haproxy 
chown haproxy:haproxy /var/chroot/haproxy 
chmod 700 /var/chroot/haproxy

service haproxy start
chkconfig --add haproxy 

vi /etc/sysconfig/syslog
SYSLOGD_OPTIONS=”-m 0 -r”

vi /etc/syslog.conf
local0.* /var/log/haproxy.log
local1.* /var/log/haproxy-1.log

Stunnel with HAProxy patch

yum remove stunnel
yum install openssl-devel openssl

wget http://www.stunnel.org/download/stunnel/src/stunnel-4.22.tar.gz
tar -xzf stunnel-4.22.tar.gz

cd stunnel-4.22
wget http://haproxy.1wt.eu/download/patches/stunnel-4.22-xforwarded-for.diff
patch -p1 < stunnel-4.22-xforwarded-for.diff

./configure --disable-fips
make
make install
mkdir -p /etc/stunnel
vi /etc/stunnel/stunnel.conf
....
vi /etc/init.d/stunnel
....
vi /etc/stunnel/your.pem
....
ln -s /usr/local/bin/stunnel /usr/sbin/stunnel
chmod +x /etc/init.d/stunnel
service stunnel start
chkconfig --add stunnel 

Install APC automatically via script

If you try to install APC via scripting, you might experience the interactive prompt asking about “apxs”. How to by pass that? Use expect (“yum install expect”). This script will solve your problem:

#!/usr/bin/expect
spawn pecl install apc
expect "Use apxs to set compile flag"
send "yes\r"
expect "install ok"
expect eof

Auto Scaling

The idea is really cool and cost efficient. However, actual implementation is not easy as it should be. There are vendors trying to bridge the gap and I believe it will be much easier in the future.

Problem at 2AM

For many services, usage fluctuates during any day (and also week). For example, we see our own pattern, from 2AM to 8AM (PST), it bottoms out. Servers sit idle, which waste money and electricity. The solution is to scale down during this period. The objective is to maintain a core capacity and add/terminate servers on demand. That’s the marketing hype cloud computing is supposed to deliver, but I guess a few companies take full advantage of this because the level of automation is still very low.

Problem with existing data

Say a cluster with 10 servers, at 2AM, you only need 5 servers, what do you do with the rest? It’s easy to think to simply shut then down. Not so fast! What about the data in those servers? If your app simply serves static/dynamic pages and do central logging (scaling problem of its own) elsewhere then this is possible. But if your application generates data and need to process it in some way, you have to deal with this data before termination. These are a few possible solutions. Please feel free to add your comments/suggestions and I’m sure there are better ways.

Decouple data storage and application layer

This is a good practice to isolate different layers. However, this comes with performance trade-off. If your app writes a lot (logging) into a central storage/database, many app servers can overload the master DB with many writes per second and then DB needs to scale out, making the problem more complicated and relying on a central storage can be a single point of failure.

Process before destroy

It depends how fast the data processing can take place, if the server needs 4 hours to process then the off-peak hours already past.

Move data to another peer before destroy

Peer helps other peers. The dying instance send all its data to another instance and then dies (hey, just like people). The problem here is dealing with the merging of data (eg: auto-increment). I think this is the best way for our particular situation (many many small writes per seconds) as any single instance only has a small portion (vs. central database) and it still follows KISS (keep it simple stupid).

Any thought on improvements or other alternatives?

PhoneGap fills a gap to the mobile world

Mobile is going wild! Mobile is going to take over! You hear this many many times in the news and I have to agree for the most part. It’s the convenience, the portability, the accessibility, the entertainment, and also the un-productivity that makes mobile devices great.

API to bridge web app and native app

I just discovered PhoneGap.com the other day when trying to evaluate my options when trying to develop an iPhone app. I have not gotten into the technical documents or prototyping yet but it’s a great concept and should be promoted to more web developers like myself.

Gap between web developers (HTML, CSS, JS, back-end) and Objective-C

This is a big gap since I just got introduced to the Mac world not too long ago. Even with a formal university training in C, C++, Java, and many self-taught web techs, learning a new thing is good but takes time, which could mean a delay in other more important things. So the concept (ideally) of PhoneGap is great, you just simply port your existing fine-tuned web app to a native iPhone app of pretty much the same quality (of course optimized for the smaller screen). This is a great idea/business pitch, very simple to understand and yet fill a great need.

Gap between a pro app builder and a single-web-service provider

For us, we don’t want to build a vast array of apps. We do not develop apps for other businesses. We simply want to port our existing web service (an ad management and tracking solution) to a native app so our client would access it anywhere, on their desktop and on the go. If doing the Objective-C way, we would need to either build it in-house or out-source to an app builder. Either way, we do not want to expose our core engines to the app (for security and modularity), so the app would use our API, which we already made public for our clients.

In-house

If the app building process and knowledge can be used as our core competency, meaning we will focus into mobile apps, then it makes sense. If there is some overlap between these platforms and our existing development then it also makes sense. I like learning a new language, new things in general, but not when pushed by deadlines and pressure. However, it’s quite a different world and frankly I don’t see myself getting too deep into it, unless we have no other choice.

Out-sourcing

Easy, fast, many options, developers, firm out there to compete for your business. However, this approach could get expensive with multiple app and on-going maintenance. We frequently add new features and improve existing ones on a daily basis. Upgrades are daily to weekly. With out-sourcing, we would need to request service from the external firm to push these new features and updates regularly. If the update frequency is slower, choosing to out-source would be a straight-forward decision.

Gap between different mobile platforms

iPhone is still the biggest player in town, but more are becoming bigger like WebOS, Android. Call them what you want, copycats, imitators. It’s part of the evolution and a needed one. So instead of paying one Objective-C team, one Java team (or other languages) to port your app, what’s better than having to write one web app that works just from any browsers. Then port/compile it to a native app on all platforms with ease (ideally).

Reality

Of course I understand PhoneGap is still in its very early stage with few up-to-date documentation/tutorials and an active codebase (unstable interface). But it will be stable and get more popular and more support. Of course you can argue that performance, stability, etc. cannot be the same as an fully native app written in Objective-C. Yes, but everything comes with some trade-offs. Some apps won’t be able to use these API bridges (eg: graphic-intensive games) but some apps (like our ad management and ad serving solution) are ideal candidates for PhoneGap (and other similar alternatives). I have only been reading on PhoneGap and iPhone app processes recently. When can we stop learning and focus on doing? I guess the answer is never and if/when it happens, it’s a bad thing. So the goal is to maintain a good balance and thing will flow along.

DataTable jQuery Plugin – JavaScript inside a table cell

If the data source is AJAX and the cell has JavaScript, the node is created and assigned directly via nTd.innerHTML=data so any JavaScript code inside the cell won’t get executed. To correct than you need the custom fnRender for that column.

Hadoop vs. MySQL

I just play with Hadoop, HBase, Hive, Pig via Cloudera’s guide (thanks to Cloudera for bringing these packages to CentOS) for a couple days. Cloudera is going in the right direction, targeting the enterprises. Hadoop is definitely on the watch list as it matures. But right now, it’s very technical and would not be suitable for the general public. I’m also disappointed on its performance for a smaller testing cluster (which I understand is unfair for what it’s designed for). For its to shine, you would need both, the problem has to be big enough and the server farms has to be big enough. However, I think there are many companies that initially test Hadoop on a small cluster before actually investing more time and money into it. It’s the first impression that makes a lasting impact. As it matures, I expect there will be overhead-reduction optimizations done on the small/low-end clusters.

Setting up MySQL is easy, scaling it is not so easy but there are many related software and technology to help you. But don’t think you can just switch to Hadoop/HBase/Hive in a day. The selling point is there (no-limit scaling on commodity hardware at the core design) but there are many land mines that you could walk on if decisions are not evaluated carefully. Right now, I see Hadoop as one of the last resorts because you’re running into a wall, exhausting RDBMS options and its related software/technology that help you scale, like memcache, message queues, load balancing, etc. You should not choose Hadoop just because you started a company and might get big in a couple years. Of course there are exceptions when you know your problem domain is only solvable in a distributed system. The popularity of Hadoop could change (or not) if the priority on Hadoop is to dominate both markets or just focus on the large farms.

You face complexity when dealing with Hadoop/Hbase/Hive/HDFS (like setting up, breaking things down into tasks, and setting up batch operations). For many many applications, MySQL (or RDBMS) ain’t going anywhere. I see smart companies use both for different parts of their operations. Unless Hadoop can do real-time, low-latency operations in distributed server farms effortlessly, there is no clear winner now, or ever. Maybe the trend on real-time search (Twitter, FaceBook) might be able to speed this up.

Hive troubleshooting

I am playing with Hadoop and Hive via Cloudera RPMs. The development status is very active, meaning it could be hard to track down the errors or find help with a specific one.

Permission of /tmp in HDFS

FAILED: Unknown exception : org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=mathie, access=WRITE, inode="tmp":hadoop:supergroup:rwxrwxr-x

Solution: You need to turn on full write permissions for /tmp

sudo -u hadoop hadoop fs -chmod 777 /tmp

.hivehistory

[root@r2 tmp]# sudo -u hadoop hive
Hive history file=/tmp/hadoop/hive_job_log_hadoop_200911142019_988931842.txt
java.io.FileNotFoundException: /.hivehistory (Permission denied)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.(FileOutputStream.java:179)
 at java.io.FileOutputStream.(FileOutputStream.java:131)
 at java.io.FileWriter.(FileWriter.java:73)
 at jline.History.setHistoryFile(History.java:45)
 at jline.History.(History.java:37)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:298)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

It means your $HOME folder is empty and it’s trying to create /.hivehistory on the top level, which of course is not possible. Solution: make sure it’s a real user with a $HOME (“echo $HOME” to check) and not via sudo

/etc/hive/conf/hive-site.xml

hive> show tables;
FAILED: Error in metadata: javax.jdo.JDODataStoreException: SQL exception: Add classes to Catalog "", Schema "APP"
NestedThrowables:
java.sql.SQLNonTransientConnectionException: No current connection.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Time taken: 3.142 seconds

Even with embedded mode for metastore, if you run into this problem, look like an issue with Cloudera RPM for Hive that uses ${user.name} not being replaced properly.

Solution: change “${user.name}” to an regular folder and Hive works fine.

haproxy vs. LVS (layer 7 vs layer 4 load balancing)

We just deployed our first haproxy load balancer and still running several LVS-TUN load balancers. Even when advertised as a fast and lightweight, it’s comparing with other layer-7 LB, not with layer-4 LB like LVS.

Load Average / CPU Usage

haproxy still requires much more resource. On the same really basic server (Pentium 3/4 or something really slow), LVS load average is always or near zero, even with many incoming requests. haproxy’s load average is about 0.3 to 0.5

Features

But the good thing about haproxy is that it has more features and flexible in term of configuration. LVS-TUN is best when the ISP allows packets to have LB’s IP (spoofed packets). But if you don’t have that option, haproxy is the next best thing. Assuming you don’t need HTTP header inspection feature of haproxy, which LVS does not have because it’s layer 4.

Bandwidth Utilization

LVS-TUN only takes the incoming portion of the requests so bandwidth requirement would be half of the full process (haproxy and same for LVS-NAT).

SSL

LVS-TUN does it effortlessly because it does not deal with the content of the packets at all. haproxy can deal with SSL with 2 options:

  • via TCP option (haproxy acts as a layer 4 LB). Pros: easy. Cons: you won’t be able to get the client IP, which to some app is a deal breaker.
  • Stunnel runs on the same machine as haproxy to process SSL then forward to haproxy as a standard request. Pros: client IP is passed with the provided patch on haproxy’s website. Cons: could slow down the LB machine if there’re many SSL requests, need to setup SSL when passing between haproxy and the workers for really secure data

Conclusion

Both haproxy and LVS have their own space. Use LVS-TUN when possible for the best performance and scalability. haproxy is best when you need header inspection and LVS-TUN is not possible with the ISP/network.

Monitor LSI MegaRAID under CentOS

Not very user friendly with documentation but I guess at least it runs!

Basic Monitor Script

Sample Output

Checking RAID status on xxx
Controller a0:  MegaRAID SAS 8344ELP
No of Physical disks online : 4
Degraded : 0
Failed Disks : 0

Upgrade Firmware

To determine the current firmware, run “MegaCli -AdpAllInfo -a0”

Product Name    : MegaRAID SAS 8344ELP
Serial No       : P00253390X
FW Package Build: 7.0.1-0064

                    Mfg. Data
                ================
Mfg. Date       : 09/27/06
Rework Date : 00/00/00
Revision No     : 8

                Image Versions In Flash:
                ================
Boot Block Version : R.2.3.15
BIOS Version       : MT33
MPT Version        : MPTFW-01.18.79.00-IT
FW Version         : 1.12.220-0560
WebBIOS Version    : 1.1-33g-e_11-Rel
Ctrl-R Version     : 1.04-019A

Check the LSi website for the current downloads, in this case:

http://www.lsi.com/storage_home/products_home/internal_raid/megaraid_sas/megaraid_sas_8344elp/index.html

- Download the firmware, unzip
- Run "MegaCli -adpfwflash -f SAS1068_FW_Image.rom -a0"