How to optemise the query having group by on two colomun

phpgurullc · March 23, 2010, 10:00am

i have one table say A

±-----------±-------------±---------------+
| value1 | value 2 | value 3 |
±-----------±-------------±---------------+
| 5 | 4adb83c1d71b | 1 |
| 5 | 44fcb93589fe | 2 |
| 5 | 4182b6e2100c | 1 |

| 6 | 4bfd8db46ab1 | 1 |

| 7 | 4b0f87c68808 | 1 |
| 7 | 413fa4646dfe | 2 |
| 7 | 4585b5671b68 | 3 |
| 7 | 4486abc0ddce | 2 |
| 7 | 4b01b048f07c | 5 |
| 7 | 4cebb754ff21 | 2 |
±-----------±-------------±---------------+

Now i need an optemise query that will give number of duplicate in value 3 colomun for each value 1 colomun

mean i need a result as

±-----------±-------------±---------------+
| value1 | value 3 | count |
±-----------±-------------±---------------+
| 5 | 1 | 2 |
| 6 | 1 | 1 |
| 7 | 2 | 3 |
±-----------±-------------±---------------+

mean
value 1(5) have value 3(1) repeate 2 time
value 1(6) have value 3(1) repeate 1 time
value 1(7) have value 3(2) repeate 3 time

for this i write a query but that query is not optemise so if have any better solution for query.

select value1,value3,count(value1) as ct from A where value1>0 and value1 != ‘’ group by value1,value2 having ct >1 limit 3;

where value 1 and value 3 are indexed

r937 · March 23, 2010, 10:15am

ALTER TABLE a ADD INDEX(value1,value2)

phpgurullc · March 23, 2010, 12:22pm

select value1,value3,count(value1) as ct from A where value1>0 and value1 != ‘’ group by value1,value3 having ct >1 limit 3;

value 1 and value 3 are already index then also it scan the whole row

r937 · March 23, 2010, 2:13pm

ALTER TABLE a ADD INDEX(value1,value3)

phpgurullc · March 23, 2010, 2:17pm

value1 and value 3 are already index in table so this does not work

r937 · March 23, 2010, 2:20pm

sure it does

i am not suggesting separate indexes on those columns individually, i am suggesting a single composite index on both columns

phpgurullc · March 23, 2010, 2:34pm

±—±------------±--------------------±------±----------------------------------------------------±-------------±--------±-----±--------±----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±--------------------±------±----------------------------------------------------±-------------±--------±-----±--------±----------------------------------------------------------+
| 1 | SIMPLE | A | range | value1_3 | value1_3 | 8 | NULL | 3124145 | Using where; Using index; Using temporary; Using filesort |
±—±------------±--------------------±------±----------------------------------------------------±-------------±--------±-----±--------±----------------------------------------------------------+

so as we see from here value1_3 is index of (value1 and value3)

Still expensive

r937 · March 23, 2010, 2:40pm

compared to what? you’re reading three million rows!!!

phpgurullc · March 23, 2010, 2:47pm

So its scanning the whole table

r937 · March 23, 2010, 3:00pm

actually, it’s not accessing the table at all, just the index

you’re using an ORDER BY clause, right?

phpgurullc · March 23, 2010, 3:03pm

its group by clause

r937 · March 23, 2010, 3:07pm

so there is no ORDER BY clause?

phpgurullc · March 23, 2010, 3:10pm

Dear Just go through this query,we are not using order by we are using group by

select value1,value3,count(value1) as ct from A where value1>0 and value1 != ‘’ group by value1,value2 having ct >1 limit 3;

r937 · March 23, 2010, 3:20pm

there’s your problem… you’re grouping by something, but selecting something else

phpgurullc · March 23, 2010, 3:22pm

sorry its typo error
select value1,value3,count(value1) as ct from A where value1>0 and value1 != ‘’ group by value1,value3 having ct >1 limit 3;

r937 · March 23, 2010, 3:38pm

okay, i don’t think i can keep dancing around the real issue here with fake table and column names

is there any chance you can show your actual table and actual query?

do a SHOW CREATE TABLE for me please

phpgurullc · March 23, 2010, 3:43pm

Table Structure is as follows

| gallery_event_image | CREATE TABLE gallery_event_image (
gallery_id bigint(20) unsigned NOT NULL,
image_id varchar(20) NOT NULL,
image_position smallint(5) unsigned DEFAULT NULL,
UNIQUE KEY gallery_id_2 (gallery_id,image_id),
KEY gallery_id (gallery_id),
KEY image_id (image_id),
KEY image_position (image_position),
KEY gallery_id_3 (gallery_id,image_position)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 |

and real query

select gallery_id,image_position,count(gallery_id) as ct from gallery_event_image WHERE image_position < 255 and gallery_id!=0 and gallery_id != ‘’ group by gallery_id,image_position having ct >1 order by ct asc limit 500;

r937 · March 23, 2010, 3:50pm

well, whaddya know… there ~was~ an ORDER BY clause after all

phpgurullc · March 23, 2010, 3:51pm

ok, so how can we solve it

r937 · March 23, 2010, 3:55pm

you can’t solve it

if you want the lowest 500 counts, you’re going to have to compute all the counts, right?

that’s what using temporary and using filesort mean – each gallery_id/image_position combination is counted, the results are placed into a temporary file, and then this file is sorted into ascending sequence by the count

that’s what you asked for, and your query is as optimized as it’s going to get