How To Troubleshoot A Server Under Load

We recently upgraded our web server from MySQL 4.1 to 5.1 (we did this in parallel with a major hardware upgrade as well). We had some initial issues with MySQL 5.1 optimizing queries in a different way than 4.1, but I’ve worked through all of those issues I think. When I run queries on my test machine or on production during off-peak hours, the queries come back super fast (<0.1 sec for most <1 sec for all). The problem, however, is that occasionally a query that runs fairly often will show up in my slow log as having run for >5.0 seconds (sometimes they’ll show >10 seconds). Rarely is there another slow query within an hour of it, so I don’t believe there is any type of locking/blocking going on (all the queries I’m working on are SELECTs btw).

What can I do to figure out why the query is randomly taking so long? I have identified some server settings that could possibly be increased (max_heap_size, read_rnd_buffer_size and thread_concurrency), but I don’t want to bump stuff up blindly in hopes of fixing something I really don’t understand.

Basic server specs:
Linux/Apache/MySQL/PHP
MySQL 5.1.58
2x 6-core Intel processors
8GB of memory

My.cnf (snipped a little):

[mysqld]
slow-query-log = 1
slow_query_log_file = /var/log/mysqlslow.log
long_query_time = 5
skip-name-resolve
skip-locking
set-variable    = read_buffer_size=2M
set-variable    = key_buffer=512M
set-variable    = max_connections=200
set-variable    = max_allowed_packet=12M
set-variable    = table_cache=2048
set-variable    = sort_buffer=5M
set-variable    = record_buffer=2M
set-variable    = net_buffer_length=16K
set-variable    = myisam_sort_buffer_size=10M
set-variable    = thread_concurrency=2
set-variable    = tmp_table_size=512M
set-variable    = query_cache_limit=12M
set-variable    = query_cache_size=512M
set-variable    = query_cache_type=1
set-variable    = ft_min_word_len=2

I am willing to post a troublesome query if interested (be warned it is massive!), but in the meantime, here’s the explain for it to show that it is relatively optimized.

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: products
         type: ref
possible_keys: s01_Products_1,active+code
          key: active+code
      key_len: 2
          ref: const
         rows: 3101
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: attr_filter
         type: ref
possible_keys: s01_Attributes_1,product_id
          key: product_id
      key_len: 4
          ref: MYSQL_DATABASE_NAME.products.id
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: mus_catxprod_filter
         type: ref
possible_keys: prod_id
          key: prod_id
      key_len: 4
          ref: MYSQL_DATABASE_NAME.products.id
         rows: 2
        Extra:
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: mus_cat_filter
         type: eq_ref
possible_keys: PRIMARY,active
          key: PRIMARY
      key_len: 4
          ref: MYSQL_DATABASE_NAME.mus_catxprod_filter.cat_id
         rows: 1
        Extra:
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE
        table: mus_parentcat_filter
         type: eq_ref
possible_keys: PRIMARY,active
          key: PRIMARY
      key_len: 4
          ref: func
         rows: 1
        Extra: Using where
*************************** 6. row ***************************
           id: 1
  select_type: SIMPLE
        table: opt_filter
         type: ref
possible_keys: attr_id+disp_order
          key: attr_id+disp_order
      key_len: 5
          ref: MYSQL_DATABASE_NAME.attr_filter.id
         rows: 3
        Extra: Using where
*************************** 7. row ***************************
           id: 1
  select_type: SIMPLE
        table: mus_color_slug
         type: ref
possible_keys: filter_type+lookup_id
          key: filter_type+lookup_id
      key_len: 12
          ref: const
         rows: 16
        Extra: Using where
*************************** 8. row ***************************
           id: 1
  select_type: SIMPLE
        table: mus_color_filter
         type: eq_ref
possible_keys: PRIMARY,active
          key: PRIMARY
      key_len: 4
          ref: MYSQL_DATABASE_NAME.mus_color_slug.lookup_id
         rows: 1
        Extra: Using where
*************************** 9. row ***************************
           id: 1
  select_type: SIMPLE
        table: mus_attr_colorstranslationsxcolors
         type: eq_ref
possible_keys: colortranslation_id+color_id
          key: colortranslation_id+color_id
      key_len: 8
          ref: MYSQL_DATABASE_NAME.opt_filter.colortranslation_id,MYSQL_DATABASE_NAME.mus_color_filter.id
         rows: 1
        Extra: Using where; Using index
9 rows in set (0.00 sec)

This query on my development machine returns 14 rows in ~0.25 seconds. It appeared in the slow log as having taken just over 5 seconds to run on production during a relatively off peak time.

Any ideas on where to begin? Are there any Windows MySQL tools I can run to fake some traffic? If so, with that running, what do I want to be watching?

Thanks to anyone that can help!