The above tells Apache to inspect the cache directory every 4 hours, keep up to 4GB worth of data, validate objects every 48 hours (no matter what the origin server set the expiry to) and expire them, by default, in 2 hours.
CacheRoot /var/adm/apache/cache
CacheSize 4096000
CacheGcInterval 4
CacheMaxExpire 48
CacheDefaultExpire 2
The typical content of the cache directory for our sites is 4.5GB in about 180,000 directories and 310,000 files.
Shortly after deploying this to production we started seeing the pattern below for the load on all web servers:
Because this was happening in regular intervals it was easy to inspect one of the servers during the load spike. The sar data had already told us the load was caused by disk IO:
When the load spiked I checked which processes were high on "wa" (waiting for IO) time and listed their open files with lsof.
$sar -b
(...)
tps rtps wtps bread/s bwrtn/s
11:30:01 AM 94.53 1.89 92.63 27.68 1275.66
11:40:01 AM 1289.34 57.63 1231.71 555.35 20566.36
11:50:01 AM 79.16 1.23 77.93 21.88 1084.00
12:00:02 PM 94.21 1.77 92.44 39.52 1274.74
12:10:01 PM 84.18 1.64 82.54 35.54 1116.60
12:20:01 PM 117.43 2.62 114.81 44.68 1550.87
12:30:01 PM 150.65 3.30 147.35 82.21 1986.41
12:40:01 PM 1228.90 1.53 1227.37 33.04 20592.15
12:50:01 PM 70.46 2.65 67.80 100.57 917.25
Sure enough I could see httpd processes with changing files open in /var/adm/apache/cache. This is expected since the cache garbage collector needs to inspect cached objects and decide what to delete in order to stay under the CacheSize limit.
What surprised me is how amazingly expensive this operation is on ext3 filesystems.
To alleviate the problem I have changed some of the filesystem parameters. In /etc/fstab I told the OS to not update access times on files and directories under /var:
/dev/VolGroup00/LogVol04 /var ext3 defaults,noatime,nodiratime 1 2Without noatime and nodiratime every file and directory read meant a disk write to update the access time. Since we don't have a use for access times in /var I did away with them.
With tune2fs I change the data option from the default (ordered) to writeback:
# tune2fs -o journal_data_writeback /dev/mapper/VolGroup00-LogVol04The writeback mode can cause file corruption on unclean shutdowns but the benefits outweights the risk for us. Our web content in /var is replicated from a master server so we would only be looking into losing non-critical data such as some state files and access log entries.
After these changes the GC is barely noticed. If you pay attention to the graph above you will see that there is no load spike for 13:00 onwards.
No comments:
Post a Comment