At CornellSun.com, we had an article about the deaths of two students in < 24 hours get linked to on the front page of huffingtonpost.com. While the link was there, our site was absolutely overwhelmed with traffic well past what our server could handle. Obviously, we're working on a plan to handle a similar situation in the future, but I thought it would be a good idea to share our experience with this group and also get some feedback.
Here are some of the issues we experienced (we're running MediaTemple's (dv) Extreme server):
- Pages often would take forever to load or fail to load
- Attempts to start shell sessions on the server via ssh would often fail
- Shell commands (including attempts to stop/restart Apache) often failed due to lack of memory
- Apache would occasionally go down
- After apache was restarted, shell commands would run into memory issues within 15 seconds
As you can see, that is a lot of traffic, and obviously it is almost exclusively anonymous users. I recall us having an article in the past which ended up on Slashdot and the Drudge report (but not in a very prominent spot), which our server seemed to be able to handle, but this barrage from the Huff Po proved too much.
While Drupal has good caching for pages served to anonymous users, it obviously wasn't enough for us.
We resorted to creating a separate static page which stripped out nearly everything: images, Javascript, interactive features, etc. If I recall correctly, the static page required only 3 HTTP requests: 1 to retrieve the static page, and in the page, 1 for the masthead image, and 1 for the CSS - all Javascript was either inline or pulled the source from an external site (e.g.: Google). I believe (but am not 100% sure) that the deluge of HTTP requests was a major reason why the memory usage was so high - I recall CPU usage being hotter than usual, but not intolerably high.
After creating the static page, we got the URL for that story to redirect to this static page (I did not do that task, but I remember that it was done in our .htaccess file). One disadvantage to this approach was that the URL which appeared in the browser was that of the static page (http://cornellsun.com/student-deaths.html) instead of the original URL (http://cornellsun.com/section/news/content/2010/03/15/cornell-community-...). Ideally, in the future I still would like users to see the URL of the original story, so that if users shared or saved the link when the static page was served, they could get the normal version once we could start serving it again.
After we did this, the site was still kind of slow, but it seemed to be tolerable at this point, and editors working with the website were able to get their work done (whereas that was not the case earlier).
In terms of developing a solution, one thing to keep in mind that while obviously newspapers want to handle links from these high-volume news sites, at the same times such links are very rare. You obviously want to handle this scenario which happens 1% of the time, but you don't want to do something which has the risk of messing up the site the other 99% of the time.