Sunday, November 02, 2008

Old challenges, new synthax ...

I've blogged before about the new SQL synthax which is becoming available in databases and how it helps solving questions which are increasingly common.

Now it's time for another example, something which doesn't come up often in a reporting environment as most tools have this feature, but can be a problem if you're building your output with a scipting language.
Adding a "Total" row at the end of your tabular output, that's it! Here you can find the Devshed forum post that prompted this summary.

Aside from the classical solution using a UNION and an ORDER BY query, I suggested using the new WITH ROLLUP clause, which is implemented by many commercial databases and MySQL (available from 4.1).
Unfortunately MySQL's implementations poses some limitations to the ORDER BY clause, actually it's impossible to directly order by the aggregate column, which is a quite common requirement, on with the main problem using the Sakila sample database.

An example requirement could be showing all sales by category and a row holding the grand total.
In Sakila the basic query would be:

  1. SELECT
  2. c.name AS category,
  3. SUM(p.amount) AS total_sales
  4. FROM
  5. sakila.payment p
  6. JOIN sakila.rental r
  7. ON p.rental_id = r.rental_id
  8. JOIN sakila.inventory i
  9. ON r.inventory_id = i.inventory_id
  10. JOIN sakila.film f
  11. ON i.film_id = f.film_id
  12. JOIN sakila.film_category fc
  13. ON f.film_id = fc.film_id
  14. JOIN sakila.category c
  15. ON fc.category_id = c.category_id
  16. GROUP BY
  17. c.name;


As I said a requirement based on ordering by total_sales in MySQL can't be satisfied without using a trick, like nesting selects, and so I'll skip it (BTW: this made me notice a bad practice left around in Sakila, usage of order by in views, see "sales_by_film_category") .

The traditional solution for adding a total row would be:

  1. SELECT
  2. c.name AS category,
  3. SUM(p.amount) AS total_sales
  4. FROM
  5. sakila.payment p
  6. JOIN sakila.rental r
  7. ON p.rental_id = r.rental_id
  8. JOIN sakila.inventory i
  9. ON r.inventory_id = i.inventory_id
  10. JOIN sakila.film f
  11. ON i.film_id = f.film_id
  12. JOIN sakila.film_category fc
  13. ON f.film_id = fc.film_id
  14. JOIN sakila.category c
  15. ON fc.category_id = c.category_id
  16. GROUP BY
  17. c.name
  18. UNION ALL
  19. SELECT
  20. NULL,
  21. SUM(p.amount) AS total_sales
  22. FROM
  23. sakila.payment p
  24. JOIN sakila.rental r
  25. ON p.rental_id = r.rental_id
  26. JOIN sakila.inventory i
  27. ON r.inventory_id = i.inventory_id
  28. JOIN sakila.film f
  29. ON i.film_id = f.film_id
  30. JOIN sakila.film_category fc
  31. ON f.film_id = fc.film_id
  32. JOIN sakila.category c
  33. ON fc.category_id = c.category_id
  34. ORDER
  35. BY CASE WHEN category IS NULL
  36. THEN 'last'
  37. ELSE 'first' END
  38. , category

As you can see it can be quite hard to read ... the order by trick is last.

The explain plan can become quite nasty too ... (MySQLPerformanceBlog wrote about this earlier), now have a look at the more modern solution:

  1. SELECT
  2. c.name AS category,
  3. SUM(p.amount) AS total_sales
  4. FROM
  5. sakila.payment p
  6. JOIN sakila.rental r
  7. ON p.rental_id = r.rental_id
  8. JOIN sakila.inventory i
  9. ON r.inventory_id = i.inventory_id
  10. JOIN sakila.film f
  11. ON i.film_id = f.film_id
  12. JOIN sakila.film_category fc
  13. ON f.film_id = fc.film_id
  14. JOIN sakila.category c
  15. ON fc.category_id = c.category_id
  16. GROUP BY
  17. c.name
  18. WITH rollup;

Much simpler, isn't it?

But what about efficiency?
Let's look at explain plans for those queries:

First one, the classic way:

mysql> EXPLAIN EXTENDED select
-> c.name AS category,
-> sum(p.amount) AS total_sales
-> from
-> sakila.payment p
-> join sakila.rental r
-> on p.rental_id = r.rental_id
-> join sakila.inventory i
-> on r.inventory_id = i.inventory_id
-> join sakila.film f
-> on i.film_id = f.film_id
-> join sakila.film_category fc
-> on f.film_id = fc.film_id
-> join sakila.category c
-> on fc.category_id = c.category_id
-> group by
-> c.name
-> union all
-> select
-> null,
-> sum(p.amount) AS total_sales
-> from
-> sakila.payment p
-> join sakila.rental r
-> on p.rental_id = r.rental_id
-> join sakila.inventory i
-> on r.inventory_id = i.inventory_id
-> join sakila.film f
-> on i.film_id = f.film_id
-> join sakila.film_category fc
-> on f.film_id = fc.film_id
-> join sakila.category c
-> on fc.category_id = c.category_id
-> ORDER
-> BY CASE WHEN category IS NULL
-> THEN 'last'
-> ELSE 'first' END
-> , category;
+----+--------------+------------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys |key | key_len | ref | rows | filtered |Extra |
+----+--------------+------------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+---------------------------------+
| 1 | PRIMARY | c | ALL | PRIMARY |NULL | NULL | NULL | 16 | 100.00 |Using temporary; Using filesort |
| 1 | PRIMARY | fc | ref | PRIMARY,fk_film_category_category |fk_film_category_category | 1 | sakila.c.category_id | 9 | 100.00 |Using index |
| 1 | PRIMARY | f | eq_ref | PRIMARY |PRIMARY | 2 | sakila.fc.film_id | 1 | 100.00 |Using index |
| 1 | PRIMARY | i | ref | PRIMARY,idx_fk_film_id |idx_fk_film_id | 2 | sakila.fc.film_id | 2 | 100.00 |Using index |
| 1 | PRIMARY | r | ref | PRIMARY,idx_fk_inventory_id |idx_fk_inventory_id | 3 | sakila.i.inventory_id | 1 | 100.00 |Using index |
| 1 | PRIMARY | p | ref | fk_payment_rental |fk_payment_rental | 5 | sakila.r.rental_id | 1 | 100.00 |Using where |
| 2 | UNION | c | index | PRIMARY |PRIMARY | 1 | NULL | 16 | 100.00 |Using index |
| 2 | UNION | fc | ref | PRIMARY,fk_film_category_category |fk_film_category_category | 1 | sakila.c.category_id | 9 | 100.00 |Using index |
| 2 | UNION | f | eq_ref | PRIMARY |PRIMARY | 2 | sakila.fc.film_id | 1 | 100.00 |Using index |
| 2 | UNION | i | ref | PRIMARY,idx_fk_film_id |idx_fk_film_id | 2 | sakila.fc.film_id | 2 | 100.00 |Using index |
| 2 | UNION | r | ref | PRIMARY,idx_fk_inventory_id |idx_fk_inventory_id | 3 | sakila.i.inventory_id | 1 | 100.00 |Using index |
| 2 | UNION | p | ref | fk_payment_rental |fk_payment_rental | 5 | sakila.r.rental_id | 1 | 100.00 |Using where |
| NULL | UNION RESULT | | ALL | NULL| NULL | NULL | NULL | NULL | NULL| Using filesort |
+----+--------------+------------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+---------------------------------+
13 rows in set, 1 warning (0.00 sec)

mysql>

Pretty bad isn't it? Two queries are executed, then glued together.

Now the second one WITH ROLLUP:

mysql> EXPLAIN EXTENDED select
-> c.name AS category,
-> sum(p.amount) AS total_sales
-> from
-> sakila.payment p
-> join sakila.rental r
-> on p.rental_id = r.rental_id
-> join sakila.inventory i
-> on r.inventory_id = i.inventory_id
-> join sakila.film f
-> on i.film_id = f.film_id
-> join sakila.film_category fc
-> on f.film_id = fc.film_id
-> join sakila.category c
-> on fc.category_id = c.category_id
-> group by
-> c.name
-> with rollup;
+----+-------------+-------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+----------------+
| 1 | SIMPLE | c | ALL | PRIMARY | NULL | NULL | NULL | 16 | 100.00 | Using filesort |
| 1 | SIMPLE | fc | ref | PRIMARY,fk_film_category_category | fk_film_category_category | 1 | sakila.c.category_id | 9 | 100.00 | Using index |
| 1 | SIMPLE | f | eq_ref | PRIMARY | PRIMARY | 2 | sakila.fc.film_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | i | ref | PRIMARY,idx_fk_film_id | idx_fk_film_id | 2 | sakila.fc.film_id | 2 | 100.00 | Using index |
| 1 | SIMPLE | r | ref | PRIMARY,idx_fk_inventory_id | idx_fk_inventory_id | 3 | sakila.i.inventory_id | 1 | 100.00 | Using index |
| 1 | SIMPLE | p | ref | fk_payment_rental | fk_payment_rental | 5 | sakila.r.rental_id | 1 | 100.00 | Using where |
+----+-------------+-------+--------+-----------------------------------+---------------------------+---------+-----------------------+------+----------+----------------+
6 rows in set, 1 warning (0.00 sec)

mysql>

A lot better, isn't it? And faster ;)

BTW: More about EPLAIN EXTENDED here.