Trimming the fat from a query

Hi there, I am trying to trim down this query to something a little more readable.

The query itself creates a view from a real table called ‘network’, only returning the latest value for each unique nic_name for a particular key (latest being defined by the datetimestamp value)

CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `network_view` AS select 
`t2`.`key` AS `key`,
`t2`.`nic_name` AS `nic_name`,
`t2`.`nic_vp` AS `nic_vp`,
`t2`.`nic_ipmp` AS `nic_ipmp`,
`t2`.`nic_ip` AS `nic_ip`,
`t2`.`nic_mac` AS `nic_mac`,
`t2`.`nic_mask` AS `nic_mask`,
`t2`.`nic_bcast` AS `nic_bcast`,
`t2`.`nic_zone` AS `nic_zone`,
`t2`.`nic_link` AS `nic_link`,
`t2`.`nic_duplex` AS `nic_duplex`,
`t2`.`nic_speed` AS `nic_speed`,
`t2`.`nic_switch` AS `nic_switch`,
`t2`.`nic_port` AS `nic_port`,
`t2`.`nic_vlan` AS `nic_vlan`,
`t2`.`nic_status` AS `nic_status`,
`t2`.`datetimestamp` AS `datetimestamp` from `network` `t2` where ((`t2`.`key`,`t2`.`nic_name`,`t2`.`datetimestamp`) in (select
`network`.`key` AS `key`,
`network`.`nic_name` AS `nic_name`,max(`network`.`datetimestamp`) AS `MAX(datetimestamp)` 
from `network` where ((`network`.`key` = `t2`.`key`) and (`network`.`nic_name` = `t2`.`nic_name`)) 
group by `network`.`key`,`network`.`nic_name`) and (`t2`.`nic_status` <> 'removed')) 
order by `t2`.`key`,`t2`.`nic_name`

I have tried this but its doesnt seem to be working

CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `original` AS select t2.key
	, t2.nic_name
	, t2.nic_vp
	, t2.nic_ipmp
	, t2.nic_ip
	, t2.nic_mac
	, t2.nic_mask
	, t2.nic_bcast
	, t2.nic_zone
	, t2.nic_link
	, t2.nic_duplex
	, t2.nic_speed
	, t2.nic_switch
	, t2.nic_port
	, t2.nic_vlan
	, t2.nic_status		
	, t2.datetimestamp FROM network t2 
	   WHERE ((t2.key,t2.nic_name,t2.datetimestamp) IN (SELECT network.key
	   , network.nic_name
	   , MAX(network.datetimestamp)
	   FROM network WHERE ((network.key = t2.key)
	   AND (network.nic_name = t2.nic_name))
	   GROUP BY network.key,network.nic_name)
	   AND(t2.nic_status &lt;&gt; 'removed'))
	ORDER BY t2.key,t2.nic_name;

Is there something I am missing when trying to simplify this query ?

any help would be greatly appreciated

try this –

CREATE 
  ALGORITHM = UNDEFINED 
  DEFINER = `root`@`localhost` 
  SQL SECURITY DEFINER 
  VIEW `network_view` 
AS 
SELECT t2.key
     , t2.nic_name
     , t2.nic_vp
     , t2.nic_ipmp
     , t2.nic_ip
     , t2.nic_mac
     , t2.nic_mask
     , t2.nic_bcast
     , t2.nic_zone
     , t2.nic_link
     , t2.nic_duplex
     , t2.nic_speed
     , t2.nic_switch
     , t2.nic_port
     , t2.nic_vlan
     , t2.nic_status
     , t2.datetimestamp
  FROM network AS t2 
INNER
  JOIN ( SELECT key
              , nic_name
              , MAX(datetimestamp)[COLOR="Blue"] AS max_datetimestamp[/COLOR]
           FROM network 
         GROUP 
             BY key
              , nic_name ) AS m
    ON m.key               = t2.key
   AND m.nic_name          = t2.nic_name
   AND m.max_datetimestamp = t2.datetimestamp
 WHERE t2.nic_status &lt;&gt; 'removed' 
ORDER 
    BY t2.key
     , t2.nic_name

ah, thanks … thats a completely different way of doing it, id just got my head around the original one :-). I think I understand it. un fortunately im getting an error when running it

ERROR 1349: View's SELECT contains a subquery in the FROM clause

Im not sure if this is just a mysql issue?

Additionally, if i wanted to add another JOIN in there so that the results also returned a field called ‘hostname’ which is retrieved from a table called ‘physical’ (ON t2.key = physical.key), would I add that in before the other JOIN are after it?

thanks again for helping

CREATE 
  ALGORITHM = UNDEFINED 
  DEFINER = `root`@`localhost` 
  SQL SECURITY DEFINER 
  VIEW `network_view` 
AS 
SELECT t2.key
     , t2.nic_name
     , t2.nic_vp
     , t2.nic_ipmp
     , t2.nic_ip
     , t2.nic_mac
     , t2.nic_mask
     , t2.nic_bcast
     , t2.nic_zone
     , t2.nic_link
     , t2.nic_duplex
     , t2.nic_speed
     , t2.nic_switch
     , t2.nic_port
     , t2.nic_vlan
     , t2.nic_status
     , t2.datetimestamp
  FROM network AS t2 
INNER
  JOIN ( SELECT key
              , nic_name
              , MAX(datetimestamp) AS max_datetimestamp
           FROM network 
         GROUP 
             BY key
              , nic_name ) AS m
    ON m.key               = t2.key
   AND m.nic_name          = t2.nic_name
   AND m.max_datetimestamp = t2.datetimestamp
 WHERE t2.nic_status &lt;&gt; 'removed' 
ORDER 
    BY t2.key
     , t2.nic_name

well, apparently you cannot create a view with a subquery in the FROM clause

i knew this once, but forgot it

the workaround is to create a view for the subquery and then join to that

as for your new question, since the additional join is also an INNER JOIN, it doesn’t matter if you put if before or after

:slight_smile:

yep, I believe that mysql limitation with the sub-queries in the FROM clause was the primary reason the query (in my original post) was written the way it was (i.e. to bypass this limitation).

as such I guess my original problem still stands, can anyone see any reason why my re-write / simplification doesn’t work ?

any help would be greatly appreciated

this is what you have –

SELECT ...
  FROM network t2 
 WHERE (
       ( t2.key,t2.nic_name,t2.datetimestamp ) 
       IN ( SELECT network.key
                 , network.nic_name
                 , MAX(network.datetimestamp)
              FROM network 
             WHERE network.key = t2.key
               AND network.nic_name = t2.nic_name
            GROUP 
                BY network.key
                 , network.nic_name )
   AND t2.nic_status <> 'removed'
       )

and this is the way i would write it –

SELECT ...
  FROM network t2 
 WHERE t2.datetimestamp = 
       ( SELECT MAX(network.datetimestamp)
           FROM network 
          WHERE network.key = t2.key
            AND network.nic_name = t2.nic_name )
   AND t2.nic_status <> 'removed'

although you might also have to put <>‘removed’ into the subquery

and of course a correlated subquery won’t perform all that well…

Hi thanks for your help. Rather than a rewriting the query as such, I really just wanted to get some help understanding what I was doing wrong with the way I was simplifying my original query. Although the original query may be a bit cludgy it works pretty well and I just wanted to make it a little more readable to those that need to understand it.

not that i don’t appreciate your help of course, which as ever is greatly appreciated :slight_smile:

OK , I take it back about not needing to re-engineer this. Ive trimmed down a little bit of the faff, and added in an additional JOIN. however, when I issue a SELECT * from this view it takes nearly 45 minutes to return (and this is on a seriously powerful box). Is there a way I can massively optimise this ?

If i do a specific SELECT against the view (as opposed to a SELECT *) it comes back in about 1-2 seconds (which is still rubbish :)). Unfortunately, SELECT * is something that certain people will need to do. 45 mins is quite a long time to wait

Am i right in saying that it is the correlated sub-queries, that are causing this to take so long

CREATE ALGORITHM=UNDEFINED
 DEFINER=`root`@`localhost` 
SQL SECURITY DEFINER 
VIEW `MYDB`.`network_test` AS
SELECT 
 t2.key AS key,
 t2.nic_name,
 t2.nic_vlan ,
 t2.nic_mask ,
 t2.nic_bcast ,
 t2.nic_zone ,
 t2.nic_status ,
 t2.datetimestamp ,
 all_servers.hostname
 FROM MYDB.network t2 
   JOIN MYDB.all_servers
   ON t2.key = all_servers.key
 WHERE ((t2.key,
 t2.nic_name,
 t2.datetimestamp) IN (SELECT
 MYDB.network.key AS key,
 MYDB.network.nic_name AS nic_name,
max(MYDB.network.datetimestamp)  
 FROM MYDB.network 
 WHERE ((MYDB.network.key = t2.key) 
 AND (MYDB.network.nic_name = t2.nic_name)) 
 GROUP BY MYDB.network.key,MYDB.network.nic_name) AND (t2.nic_status &lt;&gt; "removed")) 
 ORDER BY t2.key,t2.nic_name;

any advice would be great

analysis of performance problems requires

  1. complete understanding of all tables including their indexes
  2. EXPLAINs on the poorly performning queries

OK, so I have trimmed down the query and this is the one I am basing the following table, index and explain information on

CREATE ALGORITHM=UNDEFINED 
DEFINER=`root`@`localhost` 
SQL SECURITY DEFINER VIEW `MYDB`.`network_test` AS
SELECT 
 t2.key AS key,
 t2.nic_name,
 t2.nic_vlan ,
 t2.nic_status ,
 t2.datetimestamp 
 FROM MYDB.network t2 
 WHERE ((t2.key,
 t2.nic_name,
 t2.datetimestamp) IN (SELECT
 MYDB.network.key AS key,
 MYDB.network.nic_name AS nic_name,
max(MYDB.network.datetimestamp)  
 FROM MYDB.network 
 WHERE ((MYDB.network.key = t2.key) 
 AND (MYDB.network.nic_name = t2.nic_name)) 
 GROUP BY MYDB.network.key,MYDB.network.nic_name) AND (t2.nic_status &lt;&gt; "removed")) 
 ORDER BY t2.key,t2.nic_name;

The (newly trimmed) ‘network’ table looks like this

NETWORK TABLE

|* key *|* nic_name *|* nic_vlan *|*nic_status *|* datetimestamp *|
| 1112 | e1000g0 | 23 | online | 2010-05-16 04:27:38 |
| 1112 | e1000g1 | 21 | online | 2010-05-16 04:28:19 |
| 1112 | e1000g0 | 23 | online | 2010-05-15 01:26:39 |
| 1675 | nge0 | 20 | online | 2010-05-16 02:27:38 |
| 1675 | nge0 | 20 | online | 2010-05-14 01:26:68 |

+ [B][COLOR="Red"]Contains around 6,000 records (of which 2,500 are historical and 3,500 are current)[/COLOR][/B]

As you can see from above, The table contains records for each ‘key’ (a ‘key’ represents a physical computer). Where each unique nic_name (network interface) on a particular ‘key’ is defined with its current (defined by latest timestamp) and historical records (older timestamps) .

The NETWORK table has the following indexes (there is no unique/primary key on this table as it contains history)


	Keyname	Type	Unique	Packed	Field	Cardinality	Collation	Null	Comment
	primac	BTREE	No	No	primac	1095	A		
 	nic_ip	BTREE	No	No	nic_ip	2669	A		
	nic_name	BTREE	No	No	nic_name	200	A		
 	datetimestamp	BTREE	No	No	datetimestamp	42705	A	

The query to create the “network_test” VIEW (the one posted above) is designed to retrieve the ‘most recent’ record for each 'unique nic_name’ on a particular ‘key’. So the resulting output should (and does) look like this

|* key *|* nic_name *|* nic_vlan *|*nic_status *|* datetimestamp *|
| 1112 | e1000g0 | 23 | online | 2010-05-16 04:27:38 |
| 1112 | e1000g1 | 21 | online | 2010-05-16 04:28:19 |
| 1675 | nge0 | 20 | online | 2010-05-16 02:27:38 |

If i issue an EXPLAIN on this I get the following

mysql&gt; explain select * from network_test;
+----+--------------------+---------+------+-----------------+--------+---------+----------------+-------+----------------------------------------------+
| id | select_type        | table   | type | possible_keys   | key    | key_len | ref            | rows  | Extra                                        |
+----+--------------------+---------+------+-----------------+--------+---------+----------------+-------+----------------------------------------------+
|  1 | PRIMARY            | t2      | ALL  | NULL            | NULL   | NULL    | NULL           | 85411 | Using where; Using filesort                  | 
|  3 | DEPENDENT SUBQUERY | network | ref  | key,nic_name | key | 32      | MYDB.t2.key |    78 | Using where; Using temporary; Using filesort | 
+----+--------------------+---------+------+-----------------+--------+---------+----------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)

Im assumin the NULL in the ‘possible keys’ at the t2 stage is a big problem, I assume there isnt a way of indexing a temp table during its own creation ?

The “SELECT * FROM network_test” query takes just under 10 mins to run


3285 rows in set (9 min 40.72 sec)