Saturday, January 30, 2010

Tips for beginners with Puppet (server automation)

Since Puppet is written in Ruby, some of its syntax are Ruby-specific (I would not know for sure since I have not learned about Ruby yet). I'll try to update this post so beginners to Puppet and strangers to Ruby can get pass the unwanted headache.

Case-sensitivity

Case-changing is one of the source of confusions. If you see this error:
"Could not find dependency Class[changeTimeZone] for Node[baseserver]"
And in your nodes.pp you have:
node baseServer {
  require changeTimeZone
}
Change it to all lower case will fix the problem

Require a definition within a class

The syntax is
exec { "RunSomething":
  command => "abc",
  require => MyClass::MyFunction["def"],

Labels:

Saturday, January 23, 2010

DIsplay return code of a shell command

If you need to determine the return code for shell scripting, you can use "$?"
# ifconfig | grep eth0 >/dev/null 2>&1
# echo $?
0 => good/found
# ifconfig | grep eth1111 >/dev/null 2>&1
# echo $?
1 => bad/not found

Labels:

Thursday, January 21, 2010

haproxy and stunnel

This quick reference to install haproxy, stunnel to support SSL forwarding (with IP forwarding patch)
wget http://haproxy.1wt.eu/download/1.3/src/haproxy-1.3.22.tar.gz
tar -zxf haproxy-1.3.22.tar.gz
cd haproxy-1.3.22
make TARGET=linux26
cp haproxy /usr/sbin/haproxy
vi /etc/haproxy.cfg
...
vi /etc/init.d/haproxy
...
chmod +x /etc/init.d/haproxy 

useradd haproxy 
mkdir -p /var/chroot/haproxy 
chown haproxy:haproxy /var/chroot/haproxy 
chmod 700 /var/chroot/haproxy

service haproxy start
chkconfig --add haproxy 

vi /etc/sysconfig/syslog
SYSLOGD_OPTIONS=”-m 0 -r”

vi /etc/syslog.conf
local0.* /var/log/haproxy.log
local1.* /var/log/haproxy-1.log

Stunnel with HAProxy patch

yum remove stunnel
yum install openssl-devel openssl

wget http://www.stunnel.org/download/stunnel/src/stunnel-4.22.tar.gz
tar -xzf stunnel-4.22.tar.gz

cd stunnel-4.22
wget http://haproxy.1wt.eu/download/patches/stunnel-4.22-xforwarded-for.diff
patch -p1 < stunnel-4.22-xforwarded-for.diff

./configure --disable-fips
make
make install
mkdir -p /etc/stunnel
vi /etc/stunnel/stunnel.conf
....
vi /etc/init.d/stunnel
....
vi /etc/stunnel/your.pem
....
ln -s /usr/local/bin/stunnel /usr/sbin/stunnel
chmod +x /etc/init.d/stunnel
service stunnel start
chkconfig --add stunnel 

Labels:

Install APC automatically via script

If you try to install APC via scripting, you might experience the interactive prompt asking about "apxs". How to by pass that? Use expect ("yum install expect"). This script will solve your problem:
#!/usr/bin/expect
spawn pecl install apc
expect "Use apxs to set compile flag"
send "yes\r"
expect "install ok"
expect eof

Labels:

Monday, November 16, 2009

Hadoop vs. MySQL

I just play with Hadoop, HBase, Hive, Pig via Cloudera's guide (thanks to Cloudera for bringing these packages to CentOS) for a couple days. Cloudera is going in the right direction, targeting the enterprises. Hadoop is definitely on the watch list as it matures. But right now, it's very technical and would not be suitable for the general public. I'm also disappointed on its performance for a smaller testing cluster (which I understand is unfair for what it's designed for). For its to shine, you would need both, the problem has to be big enough and the server farms has to be big enough. However, I think there are many companies that initially test Hadoop on a small cluster before actually investing more time and money into it. It's the first impression that makes a lasting impact. As it matures, I expect there will be overhead-reduction optimizations done on the small/low-end clusters.

Setting up MySQL is easy, scaling it is not so easy but there are many related software and technology to help you. But don't think you can just switch to Hadoop/HBase/Hive in a day. The selling point is there (no-limit scaling on commodity hardware at the core design) but there are many land mines that you could walk on if decisions are not evaluated carefully. Right now, I see Hadoop as one of the last resorts because you're running into a wall, exhausting RDBMS options and its related software/technology that help you scale, like memcache, message queues, load balancing, etc. You should not choose Hadoop just because you started a company and might get big in a couple years. Of course there are exceptions when you know your problem domain is only solvable in a distributed system. The popularity of Hadoop could change (or not) if the priority on Hadoop is to dominate both markets or just focus on the large farms.

You face complexity when dealing with Hadoop/Hbase/Hive/HDFS (like setting up, breaking things down into tasks, and setting up batch operations). For many many applications, MySQL (or RDBMS) ain't going anywhere. I see smart companies use both for different parts of their operations. Unless Hadoop can do real-time, low-latency operations in distributed server farms effortlessly, there is no clear winner now, or ever. Maybe the trend on real-time search (Twitter, FaceBook) might be able to speed this up.

Labels:

Saturday, November 14, 2009

Hive troubleshooting

I am playing with Hadoop and Hive via Cloudera RPMs. The development status is very active, meaning it could be hard to track down the errors or find help with a specific one.

Permission of /tmp in HDFS

FAILED: Unknown exception : org.apache.hadoop.fs.permission.AccessControlException: Permission denied: user=mathie, access=WRITE, inode="tmp":hadoop:supergroup:rwxrwxr-x
Solution: You need to turn on full write permissions for /tmp
sudo -u hadoop hadoop fs -chmod 777 /tmp

.hivehistory

[root@r2 tmp]# sudo -u hadoop hive
Hive history file=/tmp/hadoop/hive_job_log_hadoop_200911142019_988931842.txt
java.io.FileNotFoundException: /.hivehistory (Permission denied)
 at java.io.FileOutputStream.open(Native Method)
 at java.io.FileOutputStream.(FileOutputStream.java:179)
 at java.io.FileOutputStream.(FileOutputStream.java:131)
 at java.io.FileWriter.(FileWriter.java:73)
 at jline.History.setHistoryFile(History.java:45)
 at jline.History.(History.java:37)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:298)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
 at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
It means your $HOME folder is empty and it's trying to create /.hivehistory on the top level, which of course is not possible. Solution: make sure it's a real user with a $HOME ("echo $HOME" to check) and not via sudo

/etc/hive/conf/hive-site.xml

hive> show tables;
FAILED: Error in metadata: javax.jdo.JDODataStoreException: SQL exception: Add classes to Catalog "", Schema "APP"
NestedThrowables:
java.sql.SQLNonTransientConnectionException: No current connection.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Time taken: 3.142 seconds
Even with embedded mode for metastore, if you run into this problem, look like an issue with Cloudera RPM for Hive that uses ${user.name} not being replaced properly. Solution: change "${user.name}" to an regular folder and Hive works fine.

Labels:

Wednesday, November 11, 2009

haproxy vs. LVS (layer 7 vs layer 4 load balancing)

We just deployed our first haproxy load balancer and still running several LVS-TUN load balancers. Even when advertised as a fast and lightweight, it's comparing with other layer-7 LB, not with layer-4 LB like LVS.

Load Average / CPU Usage

haproxy still requires much more resource. On the same really basic server (Pentium 3/4 or something really slow), LVS load average is always or near zero, even with many incoming requests. haproxy's load average is about 0.3 to 0.5

Features

But the good thing about haproxy is that it has more features and flexible in term of configuration. LVS-TUN is best when the ISP allows packets to have LB's IP (spoofed packets). But if you don't have that option, haproxy is the next best thing. Assuming you don't need HTTP header inspection feature of haproxy, which LVS does not have because it's layer 4.

Bandwidth Utilization

LVS-TUN only takes the incoming portion of the requests so bandwidth requirement would be half of the full process (haproxy and same for LVS-NAT).

SSL

LVS-TUN does it effortlessly because it does not deal with the content of the packets at all. haproxy can deal with SSL with 2 options:
  • via TCP option (haproxy acts as a layer 4 LB). Pros: easy. Cons: you won't be able to get the client IP, which to some app is a deal breaker.
  • Stunnel runs on the same machine as haproxy to process SSL then forward to haproxy as a standard request. Pros: client IP is passed with the provided patch on haproxy's website. Cons: could slow down the LB machine if there're many SSL requests, need to setup SSL when passing between haproxy and the workers for really secure data

Conclusion

Both haproxy and LVS have their own space. Use LVS-TUN when possible for the best performance and scalability. haproxy is best when you need header inspection and LVS-TUN is not possible with the ISP/network.

Labels:

Saturday, November 07, 2009

Monitor LSI MegaRAID under CentOS

Not very user friendly with documentation but I guess at least it runs!

Basic Monitor Script

Sample Output
Checking RAID status on xxx
Controller a0:  MegaRAID SAS 8344ELP
No of Physical disks online : 4
Degraded : 0
Failed Disks : 0

Upgrade Firmware

To determine the current firmware, run "MegaCli -AdpAllInfo -a0"
Product Name    : MegaRAID SAS 8344ELP
Serial No       : P00253390X
FW Package Build: 7.0.1-0064

                    Mfg. Data
                ================
Mfg. Date       : 09/27/06
Rework Date : 00/00/00
Revision No     : 8

                Image Versions In Flash:
                ================
Boot Block Version : R.2.3.15
BIOS Version       : MT33
MPT Version        : MPTFW-01.18.79.00-IT
FW Version         : 1.12.220-0560
WebBIOS Version    : 1.1-33g-e_11-Rel
Ctrl-R Version     : 1.04-019A
Check the LSi website for the current downloads, in this case: http://www.lsi.com/storage_home/products_home/internal_raid/megaraid_sas/megaraid_sas_8344elp/index.html
- Download the firmware, unzip
- Run "MegaCli -adpfwflash -f SAS1068_FW_Image.rom -a0"

Labels:

Friday, November 06, 2009

init.d script for gearmand

Labels:

Thursday, November 05, 2009

init.d script for stunnel on CentOS

You might need to modify some settings to suite your installation. I installed from source.
whereis stunnel
(might need to ln -s /usr/local/bin/stunnel /usr/sbin/stunnel)

vi /etc/init.d/stunnel

Labels:

Thursday, October 22, 2009

Centos and BOINC/World Community Grid

There are RPM packages for Fedora but doesn't seem one for CentOS. There is no updated instructions on running BOINC client via command line.
1) Download and run boinc_6.2.15_i686-pc-linux-gnu.sh
2) Run "./boinccmd –project_attach www.worldcommunitygrid.org " 
(Log into WCG, the key can be found on menu My Profile, near the bottom)
4) Run "./run_client --daemon" to begin the client
5) Check "tail stdoutdae.txt -f" for output
You might need to install these for dependencies:
yum install libXcomposite
yum install libXdamage

Labels:

Friday, October 16, 2009

Optimizing TCP stack under CentOS/RHEL Linux

This is from one of our pretty old servers
more /etc/sysctl.conf
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) for
# more details.

# Controls IP packet forwarding
net.ipv4.ip_forward = 0

# Controls source route verification
net.ipv4.conf.default.rp_filter = 1

# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0

# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1

# From: http://forum.ev1servers.net/showthread.php?s=&threadid=19647
# Decrease the time default value for tcp_fin_timeout connection
net.ipv4.tcp_fin_timeout = 30
# Decrease the time default value for tcp_keepalive_time connection
net.ipv4.tcp_keepalive_time = 1800
# Turn on the tcp_window_scaling
net.ipv4.tcp_window_scaling = 0
# Turn off the tcp_sack
net.ipv4.tcp_sack = 0
# Turn off the tcp_timestamps
net.ipv4.tcp_timestamps = 0
# Enable TCP SYN Cookie Protection
net.ipv4.tcp_syncookies = 1
# Enable ignoring broadcasts request
net.ipv4.icmp_echo_ignore_broadcasts = 1
# Disable ICMP Redirect Acceptance
net.ipv4.conf.all.accept_redirects = 0
# Enable bad error message Protection
net.ipv4.icmp_ignore_bogus_error_responses = 1
# Don't Log Spoofed Packets, Source Routed Packets, Redirect Packets
net.ipv4.conf.all.log_martians = 1
# Make more local ports available
# net.ipv4.ip_local_port_range = 1024 65000
# Increase maximum amount of memory allocated to shm
kernel.shmmax = 1073741824
# Improve file system performance
vm.bdflush = 100 1200 128 512 15 5000 500 1884 2
# This will increase the amount of memory available for socket input/output queues
# net.ipv4.tcp_rmem = 4096 25165824 25165824
# net.core.rmem_max = 25165824
# net.core.rmem_default = 25165824
# net.ipv4.tcp_wmem = 4096 65536 25165824
# net.core.wmem_max = 25165824
# net.core.wmem_default = 65536
# net.core.optmem_max = 25165824

Labels:

Thursday, August 06, 2009

PHP Error - DOMDocument not found

If you see this error when your PHP version is already 5.x, make sure you install "yum install php-xmlrpc" and might have to restart the web server (just to be sure).
PHP Fatal error:  Class 'DOMDocument' not found in xxx

Labels:

Wednesday, July 22, 2009

Gotchas with gettext in PHP

Be careful when you first try gettext in PHP, it can be very unreliable/unpredictable because of different naming conventions.

MyApp.po vs MyApp.mo

PO is the human-readable and you perform the translation in here. MO is the binary file and it's cached by Apache. Thus, you need to restart Apache (XAMPP) after each change to MyApp.po, then recompile it via "msgfmt MyApp.po -o MyApp.mo". If you forgot to recompile, it won't have any of the new modifications.

Folder Structure

/MyApp/locale/vi_VN/LC_MESSAGES/MyApp.mo
/MyApp/locale/da_DK/LC_MESSAGES/MyApp.mo
/MyApp/locale/en_US/LC_MESSAGES/MyApp.mo

Labels:

Saturday, May 30, 2009

rrdtool and No font found

Recently we did an upgrade to CentOS 5.3 and rrdtool stopped working, specifically, rrdgraph. The reason is that there is no font installed in the system (not sure why, you can check it via "fc-list"). To fix this, do an "yum install xorg-x11-fonts-Type1" and make sure you see some fonts listed in "fc-list". Also assume you already have fontconfig.

Labels:

Wednesday, May 27, 2009

apf vs. newly assigned IP address

If you try to ping/traceroute and got this message, there are several places to look. First, disable the firewall and if it works again then you know it's the firewall. In this case, APF. An call to "iptables -L -n" might be able to narrow down the root cause.
PING 173.x.x.x 56(84) bytes of data.
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted
ping: sendmsg: Operation not permitted

Edit conf.apf

Switch this off (from 1 to 0) and "service apf restart".
# Block all ipv4 address space marked reserved for future use (unassigned),
# such networks have no business talking on the Internet. However they may at
# some point become live address space. The USE_RD option further in this file
# allows for dynamic updating of this list on every full restart of APF. Refer
# to the 'internals/reserved.networks' file for listing of address space.
BLK_RESNET="0"

Labels:

Tuesday, March 31, 2009

Getting started with EC2 tutorial

I just start playing with EC2 today and these are some useful notes and observation. The small standard instance is quite slow, only scores 50 under Unixbench-WHT. Our quads E5405 are around 410+, Q9550 is at 650-750. I also was confused about S3 versus EBS. The console management makes life much easier, but could be improved even further. This is what you do after attaching a EBS volume to an instance:
mkfs -t ext3 /dev/sdx mkdir /mnt/data-store mount /dev/sdx /mnt/data-store
For CentOS AMI (ami-0459bc6d), you need to run this to be able to customize the AMI and build your own
depmod -a modprobe loop
As for performance, it's pretty disappointing indeed. I ran Unixbench WHT 4.1 and small instance only got 51 (CentOS 5 32-bit) and the large instance got 111 (CentOS 5 64-bit). If you run an instance 24/7, it's much more expensive for the same performance of a dedicated server. Our quad core servers usually get 600+ on the same test. The price you pay for EC2 is "on-demand", meaning that you need to automate it to the point scaling up during peak is seamlessly and not having to run a single or even multiple instances all the time.

Labels:

Saturday, March 07, 2009

Adaptec RAID under CentOS

Download the Storage Manager RPM for your card model and architecture from their site http://www.adaptec.com/en-US/support/raid/ If you run /usr/StorMan/arcconf and get this error:
./arcconf: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory
Do this
export LD_PRELOAD=/usr/StorMan/libstdc++.so.5
root@r StorMan]# ./arcconf 

  | UCLI |  Adaptec uniform command line interface
  | UCLI |  Version 5.30 (B17509)
  | UCLI |  (C) Adaptec 2003-2008
  | UCLI |  All Rights Reserved

 COPYBACK      | toggles controller copy back mode
 CREATE        | creates a logical device
 DATASCRUB     | toggles the controller background consistency check mode
 DELETE        | deletes one or more logical devices
 FAILOVER      | toggles the controller autotomatic failover mode
 SNAPSHOT      | creates a copy of a logical device
 GETCONFIG     | prints controller information
 GETLOGS       | gets controller log information
 GETSTATUS     | displays the status of running tasks
 GETVERSION    | prints version information for all controllers
 IDENTIFY      | blinks LEDS on device(s) connected to a controller
 KEY           | installs a Feature Key onto a controller
 MODIFY        | performs RAID Level Migration or Online Capacity Expansion
 RESCAN        | checks for new or removed drives
 ROMUPDATE     | updates controller firmware
 SETALARM      | controls the controller alarm, if present
 SETCONFIG     | restores the default configuration
 SETCACHE      | adjusts physical or logical device cache mode
 SETNAME       | renames a logical device given its logical device number
 SETPRIORITY   | changes specific or global task priority
 SETSTATE      | manually sets the state of a physical or logical device
 TASK          | performs a task such as build/verify on a physical or logical device
Get info
[root@r StorMan]# ./arcconf getconfig 1
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
   Controller Status                        : Optimal
   Channel description                      : SAS/SATA
   Controller Model                         : Adaptec 3405
   Controller Serial Number                 : 7C2102FD
   Physical Slot                            : 3
   Temperature                              : 32 C/ 89 F (Normal)
   Installed memory                         : 128 MB
   Copyback                                 : Disabled
   Background consistency check             : Disabled
   Automatic Failover                       : Enabled
   Defunct disk drive count                 : 0
   Logical devices/Failed/Degraded          : 1/0/0
   --------------------------------------------------------
   Controller Version Information
   --------------------------------------------------------
   BIOS                                     : 5.2-0 (12379)
   Firmware                                 : 5.2-0 (12379)
   Driver                                   : 1.1-5 (2409)
   Boot Flash                               : 5.2-0 (12379)
   --------------------------------------------------------
   Controller Battery Information
   --------------------------------------------------------
   Status                                   : Optimal
   Over temperature                         : No
   Capacity remaining                       : 99 percent
   Time remaining (at current draw)         : 3 days, 1 hours, 11 minutes

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
   Logical device name                      : raid 10
   RAID level                               : 10
   Status of logical device                 : Optimal
   Size                                     : 139798 MB
   Stripe-unit size                         : 256 KB
   Read-cache mode                          : Enabled
   Write-cache mode                         : Enabled (write-back)
   Write-cache setting                      : Enabled (write-back) when protected by battery
   Partitioned                              : Yes
   Protected by Hot-Spare                   : No
   Bootable                                 : Yes
   Failed stripes                           : No
   --------------------------------------------------------
   Logical device segment information
   --------------------------------------------------------
   Group 0, Segment 0                       : Present (0,0) 3LQ1330009747KV9
   Group 0, Segment 1                       : Present (0,1) 3LQ1SJ7009747KHQ
   Group 1, Segment 0                       : Present (0,2) 3LQ2TB3009746GYX
   Group 1, Segment 1                       : Present (0,3) 3L17LJ6009750G35


----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
      Device #0
         Device is a Hard drive
         State                              : Online
         Supported                          : Yes
         Transfer Speed                     : SAS 3.0 Gb/s
         Reported Channel,Device            : 0,0
         Reported Location                  : Connector 0, Device 0
         Vendor                             : SEAGATE
         Model                              : ST373455SS
         Firmware                           : 0002
         Serial number                      : 3LQ1323Y00009747KV9
         World-wide name                    : 5000C50004962A3
         Size                               : 70007 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
      Device #1
         Device is a Hard drive
         State                              : Online
         Supported                          : Yes
         Transfer Speed                     : SAS 3.0 Gb/s
         Reported Channel,Device            : 0,1
         Reported Location                  : Connector 0, Device 1
         Vendor                             : SEAGATE
         Model                              : ST373455SS
         Firmware                           : 0002
         Serial number                      : 3LQ12SJ70747KHQ
         World-wide name                    : 5000C50004961D8
         Size                               : 70007 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
      Device #2
         Device is a Hard drive
         State                              : Online
         Supported                          : Yes
         Transfer Speed                     : SAS 3.0 Gb/s
         Reported Channel,Device            : 0,2
         Reported Location                  : Connector 0, Device 2
         Vendor                             : SEAGATE
         Model                              : ST373455SS
         Firmware                           : 0002
         Serial number                      : 3LQ12T009746GYX
         World-wide name                    : 5000C500049604F
         Size                               : 70007 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No
      Device #3
         Device is a Hard drive
         State                              : Online
         Supported                          : Yes
         Transfer Speed                     : SAS 3.0 Gb/s
         Reported Channel,Device            : 0,3
         Reported Location                  : Connector 0, Device 3
         Vendor                             : SEAGATE
         Model                              : ST373455SS
         Firmware                           : 0002
         Serial number                      : 3LQ17LJ60000G35
         World-wide name                    : 5000C500049FE35
         Size                               : 70007 MB
         Write Cache                        : Enabled (write-back)
         FRU                                : None
         S.M.A.R.T.                         : No


Command completed successfully.

Labels:

Sunday, February 01, 2009

FreeNX is cool

If you ever need to access a CentOS (RHEL for that matter) server remotely, use FreeNX. It's very easy to install, just follow the instructions on this page: http://wiki.centos.org/HowTos/FreeNX Just a note, you can leave the authentication via password (assuming you have a strong one) instead of a key-based pass-through authentication. The instructions are still pretty much the same, only you don't have to perform some steps. Performance: wow, it's pretty fast and it looks just the same as my own desktop. I'm using the NoMachine Client for Ubuntu 8.10.

Labels: ,

Wednesday, January 28, 2009

Gotta love Ubuntu

I manage many Linux servers (CentOS) but never tried or considered to use Linux on desktop as a main option. Today, I tried Ubuntu on an existing Vista PC and installation as a virtual machine without any partitioning is the bomb. It makes trying out Ubuntu (in additional to Live CD) a breeze, really simple! Kudos to the Ubuntu teams. The way things work also very similar to Windows (though Macs are different for sure). Microsoft should be worry, for many many users, Ubuntu will be more than enough and user-friendly enough. Just like Mac OSX, it's easy to use and yet if you need to look under the hood, you can also easily to so. Competition is indeed healthy, now we have a browser war (I installed and use Firefox, IE, Safari, Chrome, Opera on a frequent basis), and an OS war (Linux, Win, MacOS).

Labels: ,

Tuesday, January 06, 2009

alias with argument in bash shell

Instead of running "ssh -p 1234 -l shelluser serverX.example.com", you could do "myssh serverX". To do that, you can put this into your .bash_profile as an alias: sshas() { ssh -p 1234 -l shelluser "$1".example.com; }

Labels:

Tuesday, December 30, 2008

2 tips - Check PHP syntax and SVN move multiple files

This is what I have in my .bash_profile. It check for all the PHP files, recursively, for any syntax error.
alias php_check_syntax='find . -type f -name \*.php -exec php -l {} \;'
This other tip is to move multiple files in SVN. You can make it a script if you want.
for i in *.test.php ; do svn move $i tests/ ; done

Labels:

Friday, December 26, 2008

XAMPP vs. doing it your own

Thinking that I would have more control by setting up the packages by myself, I was wrong. After running into the problem with pre-installed PHP by MacOS X Leopard does not have GD, IMAP and other components, I tried playing with fink, MacPorts and manual compiling. LAMP still rules and not "MAMP" for good reasons. Although Mac is quite dev-friendly, it still takes hours searching and setting up. I find the binaries under Linux/CentOS (eg: yum install/update php-gd) much easier to deal with. I guess I learned from the process but it would be much simpler to go with XAMPP. The conf files are in one place, easy to find and getting it up and running is really simple. If you plan to have a dev environment, don't bother with the manual installation that can take hours. Now this Mac is ready for some development.

Labels: ,

Tuesday, November 11, 2008

Adding a second hard drive under Linux/CentOS

Check out the new new drive

In this example, you will see I have 2x500GB drives, sda and sdb. sda already in used and sdb is new.
[root@s3 mathie]# fdisk -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14       60801   488279610   8e  Linux LVM

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System

Partition the new drive

For my purpose, I want the whole sdb to be /home, your partitioning scheme might be different, but fdisk should be very easy to use.
[root@s3 mathie]# fdisk /dev/sdb

The number of cylinders for this disk is set to 60801.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)
Create a new partition
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-60801, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-60801, default 60801):
Using default value 60801
Check the new drive
Command (m for help): p

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       60801   488384001   83  Linux
Write and save
Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Format the new partition(s)

[root@s3 mathie]# /sbin/mkfs -t ext3 /dev/sdb1
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
61063168 inodes, 122096000 blocks
6104800 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
3727 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Mount the new drive

mkdir /home
mount /dev/sdb1 /home

#add this into /etc/fstab for auto-mount on booting up
/dev/sdb1               /home                   ext3    defaults        0 0

Labels:

Wednesday, October 29, 2008

Getting lighttpd RPM

There are many places you can download the latest lighttpd RPMs:
Primary: 
http://packages.sw.be/lighttpd/
http://dag.wieers.com/rpm/packages/lighttpd/

Secondary:
EL4: http://ftp.freshrpms.net/pub/freshrpms/redhat/testing/EL4/lighttpd/
EL5: http://www.kevindustries.com/media/kw/files/linux/lighttpd/RPMS/EL5/
EL5 x86_64: http://linuxwave.blogspot.com/2007/08/installing-lighttpd-in-centos-5-for.html
1.4.17: http://www.kevindustries.com/media/kw/files/linux/lighttpd/RPMS/

Labels:

Friday, October 17, 2008

Stress test a new server

Got a new server and want to stress test its CPU & disk? Run Folding@Home or other computing service and also BitTorrent (download some Linux distros like Fedora, CentOS). Testing and yet help the community.

Folding @ Home

wget http://www.stanford.edu/group/pandegroup/folding/release/FAH6.02-Linux.tgz
tar -zxf FAH6.02-Linux.tgz
./fah6 --config
echo "./fah6 -smp -verbosity 9 $* &" > fah
chmod +x fah
./fah > /dev/null &

BitTorrent

Download RPM packages from: http://dag.wieers.com/rpm/packages/bittorrent/ http://dag.wieers.com/rpm/packages/python-crypto/ nohup launchmany-console --saveas_style 1 --max_upload_rate 600 --display_interv al 5 . > torrent.log &

Labels:

Friday, October 03, 2008

How to automatically reboot after a kernel panic?

You should investigate the root cause when possible but if the machine needs to be up and running again without much interruption, you can have it reboot when panicking after a number of seconds. Put this into /etc/sysctl.conf
kernel.panic = 60

Labels:

Sunday, September 21, 2008

pecl and memory limit error

If you run "pecl install [something]" and get this error: "Fatal error: Allowed memory size of 8388608 bytes exhausted (tried to allocate xxx bytes)", you want to change "/usr/bin/pecl" (or run "locate pecl" to see where it is) and specify a larger memory limit.
#!/bin/sh
exec /usr/bin/php -C -n -q -d include_path=/usr/share/pear \
    -d output_buffering=1 /usr/share/pear/peclcmd.php "$@"
becomes
#!/bin/sh
exec /usr/bin/php -C -n -q -d include_path=/usr/share/pear \
    -d memory_limit=16M -d output_buffering=1 /usr/share/pear/peclcmd.php "$@"

Labels:

Friday, September 19, 2008

ip_conntrack and dropped packets

For busy servers, ip_conntrack can fill up quickly and must be monitored or you will get intermittent packet drops. Check var/log/messages for these error messages. Couple values can be adjusted to the kernel:
more /proc/sys/net/ipv4/netfilter/ip_conntrack_count
more /proc/sys/net/ipv4/netfilter/ip_conntrack_max
=> count should be less than max, if it's near the maximum value, increase max

more /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_timeout_established
=> default 5 days, might want to lower it

echo 0 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_loose

Labels:

Thursday, September 18, 2008

mod_deflate bug

Hmm, another day, another bug: #33499. This time it is mod_deflate and PHP don't play well together. PHP makes a GIF/JPG file and yet mod_deflate still compress it. Solution: use mod_filter (which only available from 2.1), or disable Apache's compression for PHP files and let PHP do the compression via ob_start('ob_gzhandler');

Labels:

Wednesday, September 17, 2008

Get network stats for RRD graphing

This snippet displays Active, Passive and Established connections reported by "netstat --statistics" for saving into RRD or other monitoring tools.

Labels:

Tuesday, September 16, 2008

Apache versus lighttpd

Both run on the same server: Apache/2.0.59 (port 80) & lighttpd 1.4.19 (port 8080). 2 tests: dynamic & static files. To make things a little realistic, it's from a EU client to a US server.

Serving a dynamic file

eu$ ab -n 1000 -c 10 "http://us.server/run-some-sql.php"
Server Software:        Apache
Server Port:            80
Document Length:        824 bytes
Time taken for tests:   36.51463 seconds
Total transferred:      1213118 bytes
HTML transferred:       847886 bytes
Requests per second:    27.74 [#/sec] (mean)
Time per request:       360.515 [ms] (mean)
Time per request:       36.051 [ms] (mean, across all concurrent requests)
Transfer rate:          32.84 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      158  158   0.3    158     162
Processing:   174  200  26.4    191     340
Waiting:      173  199  26.3    191     340
Total:        332  358  26.4    349     498
Server Software:        lighttpd/1.4.19
Server Port:            8080
Document Length:        921 bytes
Time taken for tests:   35.406200 seconds
Total transferred:      1202655 bytes
HTML transferred:       857071 bytes
Requests per second:    28.24 [#/sec] (mean)
Time per request:       354.062 [ms] (mean)
Time per request:       35.406 [ms] (mean, across all concurrent requests)
Transfer rate:          33.16 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      158  158   0.6    158     167
Processing:   172  193  29.0    183     383
Waiting:      172  192  29.0    183     383
Total:        330  351  29.1    341     541
Apache: 27.74 requests/sec
Lighttpd 28.24 requests/sec

Serving a static file

eu$ ab -n 1000 -c 10 "http://us.server/img/some-image.gif"
Server Software:        Apache
Server Port:            80
Document Length:        14781 bytes
Time taken for tests:   63.858434 seconds
Total transferred:      15060000 bytes
HTML transferred:       14781000 bytes
Requests per second:    15.66 [#/sec] (mean)
Time per request:       638.584 [ms] (mean)
Time per request:       63.858 [ms] (mean, across all concurrent requests)
Transfer rate:          230.31 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      157  158   0.5    158     164
Processing:   476  478   4.6    478     549
Waiting:      158  159   4.0    159     228
Total:        634  636   4.6    636     707
Server Software:        lighttp/1.4.19
Server Port:            8080
Document Length:        14781 bytes
Time taken for tests:   63.736261 seconds
Total transferred:      14992000 bytes
HTML transferred:       14781000 bytes
Requests per second:    15.69 [#/sec] (mean)
Time per request:       637.363 [ms] (mean)
Time per request:       63.736 [ms] (mean, across all concurrent requests)
Transfer rate:          229.70 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:      157  158   0.4    158     160
Processing:   476  478   2.2    478     491
Waiting:      158  158   1.5    159     166
Total:        634  636   2.3    636     649
Apache 15.66 requests/sec
Lighttpd 15.69 requests/sec
Apache is very decent when there is a low concurrency level (about 10-20). When taken into account the stability, features, modules, it's an excellent choice. Lighttpd under high load although can perform very well, it does suffer from an issue with PHP (current with 1.4.19 and 5.1.6), its backend fast-cgi became overloaded and gave out 500 errors to clients. Bad lighty, or bad PHP! Hope they got it fixed in 1.5 or some future version of PHP

Labels:

Counting TIME_WAIT with netstat

# netstat -tan | grep ':80 ' | awk '{print $6}' | sort | uniq -c
Sample Output:

     15 CLOSING
     26 ESTABLISHED
     31 FIN_WAIT1
      7 FIN_WAIT2
     14 LAST_ACK
      2 LISTEN
     24 SYN_RECV
   2428 TIME_WAIT

Labels:

Tuesday, September 09, 2008

What happens when you do "rm -rf /*"

Just for the fun of it. Here is what happens:
[root@s10 ~]# cd /
[root@s10 /]# dir
bin   dev  initrd  lost+found  misc  opt   sbin     srv  tmp  var
boot  etc  lib     media       mnt   proc  selinux  sys  usr
[root@s10 /]# rm -rf *
rm: cannot remove directory `boot': Device or resource busy
rm: cannot remove directory `dev/shm': Device or resource busy
rm: cannot remove `dev/pts/1': Operation not permitted
rm: `proc/asound/ICH' changed dev/ino: Operation not permitted
[root@s10 /]#
[root@s10 /]# dir
-bash: /usr/bin/dir: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory
[root@s10 /]# ll
-bash: ls: command not found
[root@s10 /]# reboot
-bash: /sbin/reboot: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory
Since the processes are still running, SSH still accept connections, but cannot sign in, can't run anything either. Was it fun?!

Labels:

Thursday, August 21, 2008

vmstat - Get an overview look at your server

Get an update every one second
[root@s14 trungson]# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 3  0  41200  33324   2152 1489108    0    0     4    27    0     1  5  3 91  0
 2  0  41200  33500   2152 1489108    0    0     8     0 1838  3320  8  4 88  0
 1  0  41200  31452   2152 1489176    0    0    32     0 1787  3078  7  4 89  0
 2  0  41200  33260   2152 1489176    0    0     8     0 1788  2895  6  4 90  0
 1  0  41200  33068   2164 1489164    0    0    32   768 2038  3207  7  4 87  2
 2  0  41200  33132   2168 1489228    0    0    32     0 2082  4422 10  5 85  0
 2  0  41200  35628   2172 1489360    0    0   148     0 1924  3658  8  5 86  1
 0  0  41200  34596   2172 1489360    0    0    16     0 1904  3531  8  5 87  0
 4  0  41200  28636   2172 1489428    0    0   116     0 1922  3732  9  5 85  1
 0  0  41200  33036   2180 1489488    0    0     8   860 2127  3828  8  5 86  1
 1  0  41200  32844   2180 1489488    0    0    20     0 1784  3108  7  5 88  0
 0  0  41200  32780   2180 1489556    0    0    24     0 1850  3108  7  4 88  0
 2  0  41200  32844   2180 1489692    0    0   120     0 1915  3842  9  5 85  0
 2  0  41200  26508   2180 1489828    0    0    32   376 1976  3744  8  6 86  0
From the man page:
Procs
  r: The number of processes waiting for run time.
  b: The number of processes in uninterruptible sleep.
Memory
  swpd: the amount of virtual memory used.
  free: the amount of idle memory.
  buff: the amount of memory used as buffers.
  cache: the amount of memory used as cache.
  inact: the amount of inactive memory. (-a option)
  active: the amount of active memory. (-a option)
Swap
  si: Amount of memory swapped in from disk (/s).
  so: Amount of memory swapped to disk (/s).
IO
  bi: Blocks received from a block device (blocks/s).
  bo: Blocks sent to a block device (blocks/s).
System
  in: The number of interrupts per second, including the clock.
  cs: The number of context switches per second.
CPU
  These are percentages of total CPU time.
  us: Time spent running non-kernel code. (user time, including nice time)
  sy: Time spent running kernel code. (system time)
  id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
  wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.

Labels:

Wednesday, August 13, 2008

Use ethtool or mii-tool to detect problems with ethernet card

[root@s2 adserver]# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
                      100baseT/Half 100baseT/Full
                      1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Full
Advertised auto-negotiation: Yes
Speed: Unknown! (0)
Duplex: Half
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000033 (51)
Link detected: yes
You can also change the interface settings with ethtool.
[root@s2 adserver]# mii-tool
eth0: negotiated 10baseT-FD, link ok

Labels:

Wednesday, July 16, 2008

Linux CentOS - Kernel panic

This looks like an error with memory by sim. Anyone has a better clue? The kernel version is 2.6.9-67.0.4.EL, then we rebooted and upgraded to 2.6.9-67.0.20.EL. Any kernel bug I should be aware of?
Jul 13 04:03:13 host syslogd 1.4.1: restart.
Jul 16 08:00:01 host kernel: swap_free: Unused swap offset entry 00010000
Jul 16 08:00:01 host kernel: swap_free: Unused swap offset entry 00010000
Jul 16 08:45:01 host kernel: Unable to handle kernel paging request at virtual address 313a3921
Jul 16 08:45:01 host kernel:  printing eip:
Jul 16 08:45:01 host kernel: c015eebb
Jul 16 08:45:01 host kernel: *pde = 00000000
Jul 16 08:45:01 host kernel: Oops: 0000 [#1]
Jul 16 08:45:01 host kernel: Modules linked in: ip_vs_wrr ip_vs md5 ipv6 ipt_TOS iptable_mangle ip_conntrack_ftp ip_conntrack_irc ipt_REJECT ipt_LOG ipt_limit
iptable_filter ipt_multiport ipt_state ip_conntrack ip_tables autofs4 sunrpc dm_mirror dm_mod button battery ac parport_pc parport 8139too mii ext3 jbd
Jul 16 08:45:01 host kernel: CPU:    0
Jul 16 08:45:01 host kernel: EIP:    0060:[]    Not tainted VLI
Jul 16 08:45:01 host kernel: EFLAGS: 00010202   (2.6.9-67.0.4.EL)
Jul 16 08:45:01 host kernel: EIP is at find_vma+0x29/0x4d
Jul 16 08:45:01 host kernel: eax: 313a3919   ebx: 00c8479c   ecx: 313a3931   edx: c97ec6b4
Jul 16 08:45:01 host kernel: esi: de5b40a0   edi: c8929360   ebp: bff08518   esp: c85dcef4
Jul 16 08:45:01 host kernel: ds: 007b   es: 007b   ss: 0068
Jul 16 08:45:01 host kernel: Process sim (pid: 6909, threadinfo=c85dc000 task=c8929360)
Jul 16 08:45:01 host kernel: Stack: de5b40a0 de5b40d0 c011d901 00000000 00c8479c c85dcfc4 c032ebbf 00000007
Jul 16 08:45:01 host kernel:        0000000e 0000000b 00000000 00000000 00000000 00000000 00000000 00030001
Jul 16 08:45:01 host kernel:        00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Jul 16 08:45:01 host kernel: Call Trace:
Jul 16 08:45:01 host kernel:  [] do_page_fault+0x114/0x4dc
Jul 16 08:45:01 host kernel:  [] do_page_fault+0x0/0x4dc
Jul 16 08:45:01 host kernel:  [] error_code+0x2f/0x38
Jul 16 08:45:01 host kernel:  [] schedule_tail+0xfd/0x106
Jul 16 08:45:01 host kernel:  [] do_page_fault+0x0/0x4dc
Jul 16 08:45:01 host kernel:  [] error_code+0x2f/0x38
Jul 16 08:45:01 host kernel: Code: 5d c3 56 89 c6 53 89 d3 31 d2 85 c0 74 3c 8b 50 08 85 d2 74 0a 39 5a 08 76 05 39 5a 04 76 2b 8b 4e 04 31 d2 85 c9 74 22 8d 4
1 e8 <39> 58 08 76 0c 39 58 04 89 c2 76 0c 8b 49 0c eb 03 8b 49 08 85
Jul 16 08:45:01 host kernel:  <0>Fatal exception: panic in 5 seconds
Jul 16 10:12:18 host syslogd 1.4.1: restart.

Labels:

Friday, October 12, 2007

Load balancing FastCGI

Run this command on a worker
spawn-fcgi -p 8081 -a 192.168.2.100 -f /usr/bin/php-cgi -u lighttpd -g lighttpd -C 5 -P /var/run/spawn-fcgi-8081.pid
Don't forget to open up the right port (8081 in the example) and monitor the processes (say restart when it dies)

Reference

http://www.cyberciti.biz/tips/lighttpd-mod_proxy-to-run-php-fastcgi-app-server.html

Bind to a socket

spawn-fcgi -s /tmp/php-fastcgi-ext.sock -f /usr/bin/php-cgi -u lighttpd -g lighttpd -C 5 -P /var/run/spawn-fcgi.pid

Bind to an IP:port

spawn-fcgi -p 8081 -a 192.168.2.100 -f /usr/bin/php-cgi -u lighttpd -g lighttpd -C 5 -P /var/run/spawn-fcgi-8081.pid

Reference

http://trac.lighttpd.net/trac/wiki/Docs%3AModFastCGI#load-balancing Need also to turn on lighttpd (service lighttpd start) so reporter can get status on this server directly (through port 80)

Using an external fcgi in lighttpd.conf to load balance only a specific file

fastcgi.server = (
"/index.php"=>
(
 ("socket"=>"/tmp/php-fastcgi.socket",
  "bin-path"=>"/usr/bin/php-cgi",
  "min-procs"=>2,
  "max-procs"=>4,
  "bin-environment"=>("PHP_FCGI_CHILDREN"=>"10","PHP_FCGI_MAX_REQUESTS"=>"5000")
 ),
 ("host"=>"192.168.2.100",
  "port"=>8081,
  "check-local"=>"disable",
  "disable-time"=>30
 )
),
".php"=>
(
 ("socket"=>"/tmp/php-fastcgi.socket",
  "bin-path"=>"/usr/bin/php-cgi",
  "min-procs"=>1,
  "max-procs"=>2,
  "bin-environment"=>("PHP_FCGI_CHILDREN"=>"5","PHP_FCGI_MAX_REQUESTS"=>"1000")
 )
)
)

Labels:

Monday, January 29, 2007

Misterious 500 - Internal Server Error

This is a very generic error but it means there is some critical issue with the server. One time we experienced with this because our codebase was getting heavier and the default value of memory_limit=8M in php.ini wasn't enough. Solution: increase this value to something higher

Labels:

Monday, April 24, 2006

LVS-Tun & ISPs

LVS is a software load balancing solution. It's open-source software, built directly in Linux kernel and it's free. The director (load balancer) can be in one DC, and the real servers are in different DCs. The director only needs good bandwidth, Pentium 4 or even P3 is fine since it's Layer 4 switching (less overhead than Layer 7, eg: HAProxy). The incoming traffic flows from Client -> Director -> Worker. The returning traffic: Worker -> Client. As you can see, the director has a much higher throughput since it only handles incoming requests. The return packets come directly from the workers. We current manage several LVS setups. One example: 3 directors, 12 real servers, in over 5 different DCs spanning across US and Europe. It's quite easy to set up and manage. Reference: LVS-Tun is an LVS original. It is based on LVS-DR and has the same high scalability/throughput of LVS-DR. LVS-Tun can be used with realservers that can tunnel (==IPIP encapsulation). The director encapsulates the request packet inside an IPIP packet before sending it to the realserver. The realserver must be able to decapsulate the IPIP packet. Initially only Linux could decapsulate IPIP packets, but recently FreeBSD and W2K can now do it too (hmm 2005, I think Microsoft has dropped support for IPIP). With LVS-DR, the realservers can have almost any OS.

Unlike LVS-DR, with LVS-Tun the realservers can be on a network remote from the director, and can each be on separate networks. Thus the realservers could be in different countries (e.g. a set of ftp mirror sites for a project). If this is the case, the realservers will be generating reply packets with VIP:port->CIP (where port is the LVS'ed service). Not being on the VIP network, the routers for the realservers will have to be programmed to accept outgoing packets with src_addr=VIP:port. Routers normally drop these packets as an anti-spoofing measure. If you aren't in control of the routers, you'll just have to inform the people who are, that packets from VIP:port are valid for your business. If they don't want to help you with your business, then you should find another provider who will. Read more here and here

To detect if the ISPs allow LVS-TUN, follow the tests on this page, more specifically, this test:

realserver# traceroute -s VIRTUAL_IP -n CLIENT_IP
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
Be patient and wait on the director to see something similar to the following
director# tcpdump -ln host CLIENT_IP
tcpdump: listening on eth0
19:20:20.310162 CLIENT_IP > VIRTUAL_IP: icmp: CLIENT_IP udp port 33483 unreachable
19:22:40.639844 CLIENT_IP > VIRTUAL_IP: icmp: CLIENT_IP udp port 33511 unreachable
19:22:45.641061 CLIENT_IP > VIRTUAL_IP: icmp: CLIENT_IP udp port 33512 unreachable
19:23:30.664315 CLIENT_IP > VIRTUAL_IP: icmp: CLIENT_IP udp port 33521 unreachable
If you don't see anything response on the director, it might be the realserver cannot get any packet out to the client because the ISP's router dropped these packets.

It is very important that ISPs see the demand/request for LVS-TUN setups to distinguish it from malicious network attacks. Security is good but cannot be too strict or rigid to have flexibility, growth for business. If you have experienced setting up LVS-TUN with other ISPs, webhosting companies, please let me know to add to the list.

List of ISPs support LVS-TUN (allow outgoing spoofed-yet-valid packets for the realservers):

  1. LayeredTech: at Savvis building in Dallas, their other DataBank DC blocks this. Currently working with LT to unblock. Updated: LT is very accommodating for their clients, they exclude our load balancer's IP address in the router filter list.
  2. Hivelocity: blocked but then unblocked, willing to make an exception.
  3. 1paket at Lambdanet in Germany
  4. SoftLayer: custom router setup
  5. WebNX in LA
List of ISPs do NOT support LVS-TUN (drop these packets and are not willing to make exception):
  1. ThePlanet: denied, not willing to make exception in network filter for this type of packets, against their AUP

Labels: