Last week a new module appeared on Drupal.org named Purge.
Purge is designed to selectively purge affected pages from a caching reverse proxy after changes are made to the site. The Purge module can be used to add this functionality to the nginx native cache.
The Purge module relies upon the Expire module. The Expire module appeared some time ago when Boost dev mikeytown2 ripped his cache expiration code from the Boost module and created the Expire module as an api for other cache modules to use. The Varnish integration module now uses Expire to perform selective purges from the Varnish cache.
When content is changed on a site. like a node or a user page, the Expire module checks to see if that content is present in any forum lists, taxonomy lists, the internal alias list, views lists and cck references and then creates an array of URLs that are affected by the change. For example, a change to http://mysite.com/node/1231 might then cause changes in the front page, http://mysite.com/, the forum page if the node was a forum node, http://mysite.com/forum, the specific forum, http://mysite.com/forum/12, and any aliases such as http://mysite.com/title-of-node or http://mysite.com/forum/general-discussion.
The Expire module then passes that list of changed URLs to the Purge module which uses php-curl to send specially crafted http requests to your web server or proxy. The Squid proxy supports this functionality natively. You need only send an http request using the "PURGE" method instead of "GET" to Squid and Squid will purge from its cache whatever URL you send. Varnish has its own text based administration interface that the Varnish integration module uses for receiving purge requests, but the Varnish Control Language can also be used to handle Squid-like http requests as well.
Fortunately for us, nginx can also be configured to accept the http requests generated by the Purge module and use them to purge pages from the nginx cache.
First we need to compile nginx with a contrib module that adds the cache purge functionality. This feature does not exist in the stock nginx code. The module can be found at https://github.com/FRiCKLE/ngx_cache_purge/ I've added the module to the nginx on my Ubuntu ppa: https://launchpad.net/~brianmercer/+archive/nginx
Then make sure you have the php-curl module installed on your server. (php5-curl for Debian/Ubuntu)
nginx configuration begins with a directive at the http level which creates the cache. Since I'm using php5-fpm for my backend, I define my cache for use through the fastcgi interface:
fastcgi_cache_path /var/cache/nginx/mycache levels=1:2 keys_zone=mycache:1m inactive=30d max_size=2g;
The specific options are not the subject of this post, but in short, this creates a cache which stores files at /var/cache/nginx/mycache, using a two level directory structure (for more nodes you want to create a deeper structure to prevent single directories from having thousands of files), using a cache named "mycache" that stores 1 megabyte of URLs in a lookup table in RAM (again more cached urls should be allocated more RAM but keep in mind these are just the hashed URLs and not the page data), keeping files on disk for a maximum time of 30 days regardless of page expiration, and with a maximum size on disk of 2GB.
Next we alter our server location that handles our Drupal requests. This will vary depending on your configuration, but here's mine:
location = /index.php {
include /etc/nginx/fastcgi_params;
fastcgi_param SCRIPT_FILENAME /var/www/$host/drupal/index.php;
fastcgi_hide_header X-Drupal-Cache; #optional
fastcgi_hide_header Etag; #optional
fastcgi_pass php;
# Cache Settings
set $nocache "";
if ($http_cookie ~ SESS) { #logged in users should bypass the cache
set $nocache "Y";
}
if ($request_uri ~ \? ) { # Purge doesn't handle query strings yet
set $nocache "Y";
}
fastcgi_cache mycache;
fastcgi_cache_key $host$request_uri;
fastcgi_cache_valid 200 301 1d;
fastcgi_ignore_headers Cache-Control Expires;
fastcgi_cache_bypass $nocache;
fastcgi_no_cache $nocache;
add_header X-nginx-Cache $upstream_cache_status; #optional
expires epoch;
}
and then we create a new server listening on a random port on the localhost interface:
## Cache purging
server {
listen 127.0.0.1:8888 default_server;
access_log /var/log/nginx/caching.access.log;
keepalive_timeout 0 0;
error_page 405 $request_uri;
location / {
fastcgi_cache_purge mycache $host$request_uri;
return 200; #use until Purge module logs 404s better
}
}
The "error_page" directive is required to keep nginx from rejecting http requests using the "PURGE" method with a 405 error "Method not allowed". The "return 200" is required because nginx will return a 404 "Not Found" if the page is not in the cache and the Purge module currently logs that as an error and we don't like those pink lines in our logs. Squid does the same, so this is just a matter of the Purge module logging a cache miss as a non-error response.
Then enable the Purge and Expire modules and set the proxy URL at admin/settings/purge to "http://127.0.0.1:8888".
This configuration can be used for multiple domains using the same cache.
If you use something like firebug you can check the headers and the x-nginx-cache header will show either HIT or MISS or BYPASS. The watchdog log will contain entries from the Expire module listing each URL purged.
Note that Boost does purge cached pages with query strings, but Expire does not. I see no solution at the moment other than to disable caching for any URL with a query string. Which brings us to the greatest limitation of nginx cache purging. Unlike Varnish, nginx does not perform wildcard purging.
Purge could handle query strings like http://test.brianmercer.com/forums/loquor-haero/camur-ibidem-quadrum?sor... if you could purge test.brianmercer.com/forums/loquor-haero/camur-ibidem-quadrum* using a wildcard at the end to purge any cached pages with query strings. But the nginx cache purge module doesn't support it and as far as I know has no plans to do so.
There are other limitations of Purge and Expire as well. As far as I know, Expire does not know how to purge Panels pages yet. It does have support for Views and cck.
Also, Purge and Expire are both at the dev stage of development and have no official releases. The Varnish module says that it uses Expire, however drupal.org stats show 1384 users for Varnish and only 71 users for Expire. I'm concerned that new development on expire code might be going on in Varnish without being ported to Expire.
Varnish can also purge the entire cache for a domain. nginx can only do this through something like "rm -rf /var/cache/nginx/mycache" and then if you want to limit it to a certain domain you'd have to create a different cache bucket for each domain.
If you're going to try this configuration you should also use my solution for caching aggregated js and css files or you're going to run into problems for the reasons explained there. http://groups.drupal.org/node/124709
While this configuration has some issues, for simpler sites it can provide fast and easy reverse proxy caching with very little RAM overhead.
This is very much a work in progress and I'm hoping you guys can provide some feedback and spot the problems. Thanks.