IN Subqueries in MySQL 5.6 Are Optimized Away

In MySQL 5.6, it looks like IN() subqueries are optimized even better than they are in MariaDB 5.5. Here’s a typical IN() subquery, using the sakila sample database (query taken from slide 6 of the presentation about new MySQL 5.6 … Continue reading

In MySQL 5.6, it looks like IN() subqueries are optimized even better than they are in MariaDB 5.5. Heres a typical IN() subquery, using the sakila sample database (query taken from slide 6 of the presentation about new MySQL 5.6 optimizer statistics):

SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)

Before there were any subquery optimizations, say if you are using MySQL 5.1, you would have to rewrite this query as a JOIN, to avoid the dreaded DEPENDENT SUBQUERY that shows up in the EXPLAIN:

mysql> explain SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: NULL
key: idx_title
key_len: 767
ref: NULL
rows: 1025
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: film_actor
type: index_subquery
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: func
rows: 1
Extra: Using index

MariaDB 5.5 boasts subquery optimization, and rightfully so. It looks like MariaDB materializes the subquery:

MariaDB [sakila]> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1043
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table:
type: eq_ref
possible_keys: distinct_key
key: distinct_key
key_len: 2
ref: func
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: film_actor
type: index
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: NULL
rows: 4889
Extra: Using index

So MariaDB recognizes the subquery and optimizes it. But it is still optimized as a subquery. There are 3 rows here, a new <subquery2> table is used to help optimize the results.

In MySQL 5.6, the subquery is actually optimized away. The optimizer actually changes it. You can see this in the EXPLAIN plan by looking at the select_type. In both the MySQL 5.1 and MariaDB 5.5 examples, the select_type is PRIMARY, which is used for the outer query in a subquery (or the first SELECT in a UNION, but that does not apply here). In MySQL 5.6, the select_type is SIMPLE for both rows. Note that MySQL 5.6 also does not have to add a third table as MariaDB does:

mysql> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1000
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: film_actor
type: ref
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: sakila.film.film_id
rows: 1
Extra: Using index; FirstMatch(film)
2 rows in set (0.00 sec)

In the presentation, the Oracle team says that for DBT3 Query #18, execution time reduces from days to seconds. With optimizations like this, I believe it!