Quantcast
Viewing all articles
Browse latest Browse all 49206

Separate Old Content

Hello, I have been developing with Drupal for a while now, but new to the high performance side of things. I am working with a site with about 105,000 nodes and keeps growing at a fast rate. But most of the nodes get put in an archived state after a week or so. So most of the content on the site is very up to date, but we want to keep the old content because search engines have indexed them, and our authors and content providers look up their old content to reference in new articles. But the views for the site filter out the content based on the value of that archived field. So they have to search and filter through over 100,000 records just to find the few records they are pull up for the site. We have many levels of load balancing and caching on the site (with boost) to speed things up, but there is one section of the site that needs to be rebuilt when at certain times when new content is added. In our environment, we have huge spikes in traffic during certain events. During these events our content authors are usually posting articles and blogs which triggers these sections to be rebuilt, while we get huge spikes in our visitor traffic, and while we are under heavy load these queries are taking a very long time to execute.

I have been doing a lot of research and reading through many of the posts on this group, and have gotten some great information. I am wondering if there is a way to export these archived nodes to either new tables or a new database, so that all of the queries run on the site will only filter through the active content, while to few situations where the old content needs to be accessed it is still available. I have thought about serving all this content as static html files as I read in one of the posts here, but that won't allow me to have all the menus and blocks updated with the new styles and content.

I am figuring some other organizations, especially news organizations must have run into this problem before and I am wonder what people have done to get around this problem. One thought I had was creating another instance of drupal where I would create a migration path to copy all non-node specific data and only the node specific data for the archived nodes and changing the path for those nodes, and setting the load balancer to serve those paths from the other instance. This would mess up the indexing for search engines, but we would update our site map to hopefully account for it. Not ideal SEO wise, but just a thought to try to limit the number of records each query has to run through.

I know I have alot of query optimization to do, but I think if I cut the number of records that get searched and filtered by 90% then I could accurately asses my queries to determine how to further optimize them.

Any thoughts or suggestions would be greatly appreciated,

Thanks,


Viewing all articles
Browse latest Browse all 49206

Trending Articles