We have just installed the latest version of Drupal 5 on a new site and loaded it up with content, about 4.1 million nodes. These nodes are split between three types of content types, which amounts to around 2.6 mill, 1.5 mill and 70.000 nodes for each type.
Hitting the administrator menu : Content management -> content destroys performance beyond anything I've seen :)
This is the query which totally smother the performance, taken from the developer module:
337659.39 0 pager_query SELECT n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50
Why?
My opinion is that such an expensive query shouldn't be allowed to be in a default installation of Drupal. This has the implications that Drupal can not grow beyond the scope of "small sites" without tweaking the core code.
I'm not even sure what the query is doing. Sorting the node table on field changed and then fetching data for the 50 first nodes should be pretty straight forward. Why the join with table users? The node table already has a field uid. It seems to me that getting the name of the user for each node later would be much faster (it is a lot faster), so what is the real reason for that join?
Running this query a few times in a row brings it down to 2.5 minutes:
SELECT n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
This query returns it result set in 0.00 seconds (after having been run once - first run completed in 9.5 seconds - query cache kicks in):
SELECT * FROM node ORDER BY changed DESC LIMIT 0, 50;
The in the code, which loops through the result set, you could easily get the name of the user for each uid (and it would be SO much faster!)
I feel a default installation of Drupal should not buckle under even with semi-large number of nodes in it's database, especially when we aren't doing anything fancy besides clicking on a few administration links.
How do other people with semi-large sites handle this? - Or are we doing something wrong?
I tried searching for this, since I figured this would be a quite common problem for anyone with a mature number of nodes, but I couldn't find anything... So maybe this is not a problem but something we haven't configured or done correctly.
How many other cumbersome sql queries will we stumble on later?
Right now I see no other solution than to change the code so standard administration will actually work...
Settings for PHP:
max_execution_time = 800
max_input_time = 260
memory_limit = 50M
Total memory is 8GB, with another 8GB of swap, running on Dual Quad Core Intel-server from Dell
Running Ubuntu server 7.04
PS.
"The Performance and Scalability forum on Drupal.org is where else to look."
Maybe that line should be removed from the groups header, since that forum is deprecated?