Optimizing MIN and MAX MySQL Functions
MySQL can optimize aggregate functions like MIN and MAX as long as the columns specified are indexed. This means that, in the case of MIN and MAX, the optimizer should be able to identify the highest and lowest values of an indexed column from the B-Tree index. Say I have a table like below:
CREATE TABLE `history` ( `h_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `u_id` int(10) unsigned NOT NULL, `cn_id` int(10) unsigned NOT NULL, `f_id` int(10) unsigned NOT NULL PRIMARY KEY (`h_id`) ) ENGINE=InnoDB
If I want to get the MAX value for cn_id, I’d to a query like this which will be a full table scan:
mysql (test) > EXPLAIN SELECT MAX(cn_id) FROM history \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: history type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 19908716 Extra: 1 row in set (0.00 sec)
If I add an index on cn_id, the query is optimized internally:
mysql (test) > EXPLAIN SELECT MAX(cn_id) FROM history \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: Select tables optimized away 1 row in set (0.00 sec)
What if I’d want to get MAX(cn_id) for a particular u_id? Well, another full table scan!
mysql (test) > EXPLAIN SELECT MAX(cn_id) FROM history WHERE u_id = 106476 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: history type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 19908716 Extra: Using where 1 row in set (0.00 sec)
Again the solution is simple, make sure that cn_id is part of an index including the u_id, in this case I created a composite key on (u_id, cn_id) so that once the optimizer found index records for u_id = 106476, within the same index record are the cn_id values.
mysql (test) > EXPLAIN SELECT MAX(cn_id) FROM history WHERE u_id = 106476 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: Select tables optimized away 1 row in set (0.00 sec)
Till next time!
UPDATE: Based on Daniel’s comment, to demonstrate the internal optimization for MIN and MAX with the proper keys in place, here is a sample query executed:
mysql (test) > EXPLAIN PARTITIONS SELECT MAX(h_date) FROM history WHERE t_id = 10947385 \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: NULL partitions: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: Select tables optimized away 1 row in set (0.00 sec) mysql (test) > pager md5sum PAGER set to 'md5sum' mysql (test) > flush status; Query OK, 0 rows affected (0.02 sec) mysql (test) > SELECT MAX(h_date) FROM history WHERE t_id = 10947385 \G 54f1cd94f9bd1390899b13244443968f - 1 row in set (0.00 sec) mysql (test) > nopager PAGER set to stdout mysql (test) > SHOW STATUS LIKE 'handler%'; +----------------------------+-------+ | Variable_name | Value | +----------------------------+-------+ | Handler_commit | 1 | | Handler_delete | 0 | | Handler_discover | 0 | | Handler_prepare | 0 | | Handler_read_first | 0 | | Handler_read_key | 1 | | Handler_read_last | 0 | | Handler_read_next | 0 | | Handler_read_prev | 0 | | Handler_read_rnd | 0 | | Handler_read_rnd_next | 0 | | Handler_rollback | 0 | | Handler_savepoint | 0 | | Handler_savepoint_rollback | 0 | | Handler_update | 0 | | Handler_write | 0 | +----------------------------+-------+ 16 rows in set (0.04 sec)
Comments
8 Comments
http://bugs.mysql.com/bug.php?id=66187
Thanks Justin, seems to be something to watch out for too. I tested with 5.5.30 and it looks to have been solved on that version. See my comment on the bug. 🙂
If it worked you would see ‘select tables optimized away’. Please check your test.
Justin,
Yes I just realized, actually the EXPLAIN output is correct but my test was wrong, I had a key on t_id instead which was the point of the bug.
Keep up the great blogs – you’re touching on some great topics, and sometimes you have “bugs” in your blogs but overall they’re great.
It is a good blog post. I just wanted Jervin to be aware that there was a bug that affects this particular feature. Also, my comments were not about the test in this post, but the comment he filed against the open bug I linked to.
It’s great to see the explain plans, but what happens if you actually run the query? You probably want to see the execution time and the handler status variables.
Sheeri,
Thanks for the feedback!
Daniel,
I’ve made a little update to demonstrate that, thanks!
Leave a Comment