« Archives in January, 2010

Tips for beginners with Puppet (server automation)

Since Puppet is written in Ruby, some of its syntax are Ruby-specific (I would not know for sure since I have not learned about Ruby yet). I’ll try to update this post so beginners to Puppet and strangers to Ruby can get pass the unwanted headache.

Case-sensitivity

Case-changing is one of the source of confusions. If you see this error:

"Could not find dependency Class[changeTimeZone] for Node[baseserver]"

And in your nodes.pp you have:

node baseServer {
  require changeTimeZone
}

Change it to all lower case will fix the problem

Require a definition within a class

The syntax is

exec { "RunSomething":
  command => "abc",
  require => MyClass::MyFunction["def"],

DIsplay return code of a shell command

If you need to determine the return code for shell scripting, you can use “$?”

# ifconfig | grep eth0 >/dev/null 2>&1
# echo $?
0 => good/found
# ifconfig | grep eth1111 >/dev/null 2>&1
# echo $?
1 => bad/not found

Bandwidth cost for EC2/cloud computing

Cloud computing often advertises on the per hour instance cost (like 10 cents per hour). Cheap right? Not necessarily. There are many additional charges (bandwidth, IO, etc. etc.) that can become much much more expensive than the cost of running the instance. Any wise company that want to invest their time in any cloud must crunch the numbers first. SoftLayer at 10 cent/GB is the most reasonable rate so far for quality. Dedicated hosting companies will continue to have their share because they can oversell (to a degree, some aggressively, some conservatively) when pooling together many clients, some uses only 10% of allocated bandwidth, some use 100%. On clouds, it’s on demand. Thus, the best choice is a hybrid approach, use clouds for burst, surge and maintain the core infrastructure on dedicated servers where deals can be found (i.e. good hosting companies that oversell a little and still provide a good quality service). And of course, every medium/large online system should be designed with outage preventions (high availability, redundancy, no single point of failure). This is why many dedicated providers (SoftLayer, ThePlanet, LayeredTech, RackSpace, ServerBeach, etc.) offer their own cloud. Choices are good and the pie is getting bigger.

haproxy and stunnel

This quick reference to install haproxy, stunnel to support SSL forwarding (with IP forwarding patch)

wget http://haproxy.1wt.eu/download/1.3/src/haproxy-1.3.22.tar.gz
tar -zxf haproxy-1.3.22.tar.gz
cd haproxy-1.3.22
make TARGET=linux26
cp haproxy /usr/sbin/haproxy
vi /etc/haproxy.cfg
...
vi /etc/init.d/haproxy
...
chmod +x /etc/init.d/haproxy 

useradd haproxy 
mkdir -p /var/chroot/haproxy 
chown haproxy:haproxy /var/chroot/haproxy 
chmod 700 /var/chroot/haproxy

service haproxy start
chkconfig --add haproxy 

vi /etc/sysconfig/syslog
SYSLOGD_OPTIONS=”-m 0 -r”

vi /etc/syslog.conf
local0.* /var/log/haproxy.log
local1.* /var/log/haproxy-1.log

Stunnel with HAProxy patch

yum remove stunnel
yum install openssl-devel openssl

wget http://www.stunnel.org/download/stunnel/src/stunnel-4.22.tar.gz
tar -xzf stunnel-4.22.tar.gz

cd stunnel-4.22
wget http://haproxy.1wt.eu/download/patches/stunnel-4.22-xforwarded-for.diff
patch -p1 < stunnel-4.22-xforwarded-for.diff

./configure --disable-fips
make
make install
mkdir -p /etc/stunnel
vi /etc/stunnel/stunnel.conf
....
vi /etc/init.d/stunnel
....
vi /etc/stunnel/your.pem
....
ln -s /usr/local/bin/stunnel /usr/sbin/stunnel
chmod +x /etc/init.d/stunnel
service stunnel start
chkconfig --add stunnel 

Install APC automatically via script

If you try to install APC via scripting, you might experience the interactive prompt asking about “apxs”. How to by pass that? Use expect (“yum install expect”). This script will solve your problem:

#!/usr/bin/expect
spawn pecl install apc
expect "Use apxs to set compile flag"
send "yes\r"
expect "install ok"
expect eof

Auto Scaling

The idea is really cool and cost efficient. However, actual implementation is not easy as it should be. There are vendors trying to bridge the gap and I believe it will be much easier in the future.

Problem at 2AM

For many services, usage fluctuates during any day (and also week). For example, we see our own pattern, from 2AM to 8AM (PST), it bottoms out. Servers sit idle, which waste money and electricity. The solution is to scale down during this period. The objective is to maintain a core capacity and add/terminate servers on demand. That’s the marketing hype cloud computing is supposed to deliver, but I guess a few companies take full advantage of this because the level of automation is still very low.

Problem with existing data

Say a cluster with 10 servers, at 2AM, you only need 5 servers, what do you do with the rest? It’s easy to think to simply shut then down. Not so fast! What about the data in those servers? If your app simply serves static/dynamic pages and do central logging (scaling problem of its own) elsewhere then this is possible. But if your application generates data and need to process it in some way, you have to deal with this data before termination. These are a few possible solutions. Please feel free to add your comments/suggestions and I’m sure there are better ways.

Decouple data storage and application layer

This is a good practice to isolate different layers. However, this comes with performance trade-off. If your app writes a lot (logging) into a central storage/database, many app servers can overload the master DB with many writes per second and then DB needs to scale out, making the problem more complicated and relying on a central storage can be a single point of failure.

Process before destroy

It depends how fast the data processing can take place, if the server needs 4 hours to process then the off-peak hours already past.

Move data to another peer before destroy

Peer helps other peers. The dying instance send all its data to another instance and then dies (hey, just like people). The problem here is dealing with the merging of data (eg: auto-increment). I think this is the best way for our particular situation (many many small writes per seconds) as any single instance only has a small portion (vs. central database) and it still follows KISS (keep it simple stupid).

Any thought on improvements or other alternatives?