Sometimes I see people having great difficulties in describing how to fetch data for a report.
They are unable to reason by sets and tend to describe things in procedural terms.
Here I'm posting a small example of how you can write a query that reproduces that procedural reasoning and lets the optimizer do the work of translating it into efficient SQL.
Say someone has a table structure like this, a main table named
guys holding their id and name and two tables
bads_attributes and
goods_attributes, if you are a bad guy your attributes will be in the bads_attributes table and vice versa.
Looks ugly? It is, but you'll find it around, sooner or later :-(
And it's not even the worst case scenario, confronted with a similar requirement I've heard that stored procedures where proposed as the ideal solution.
The table structure:
DROP TABLE IF EXISTS `test`.`guys`;
CREATE TABLE `test`.`guys` (
`guy_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`guy_name` varchar(45) NOT NULL,
PRIMARY KEY (`guy_id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `test`.`bads_attributes`;
CREATE TABLE `test`.`bads_attributes` (
`attribute_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`guy_id` int(10) unsigned NOT NULL,
`attribute_name` varchar(45) NOT NULL,
PRIMARY KEY (`attribute_id`,`guy_id`),
KEY `FK_bads_attributes_1` (`guy_id`),
CONSTRAINT `FK_bads_attributes_1` FOREIGN KEY (`guy_id`) REFERENCES `guys` (`guy_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `test`.`goods_attributes`;
CREATE TABLE `test`.`goods_attributes` (
`attribute_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`guy_id` int(10) unsigned NOT NULL,
`attribute_name` varchar(45) NOT NULL,
PRIMARY KEY (`attribute_id`,`guy_id`),
KEY `FK_goods_attributes_1` (`guy_id`),
CONSTRAINT `FK_goods_attributes_1` FOREIGN KEY (`guy_id`) REFERENCES `guys` (`guy_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8;
Now let's retrieve those attributes with a query that mimics the logic described above
SELECT
g.guy_id,
g.guy_name,
CASE
WHEN EXISTS (SELECT 0 FROM bads_attributes b WHERE b.guy_id = g.guy_id)
THEN (SELECT group_concat(b.attribute_name separator ', ') FROM bads_attributes b WHERE b.guy_id = g.guy_id GROUP BY b.guy_id)
WHEN EXISTS (SELECT 0 FROM goods_attributes a WHERE a.guy_id = g.guy_id)
THEN (SELECT group_concat(a.attribute_name separator ', ') FROM goods_attributes a WHERE a.guy_id = g.guy_id GROUP BY a.guy_id)
ELSE 'no attributes for this guy'
END right_attributes
FROM
guys g
This goes after the reasoning described above, if you are found in the bads_attributes then your data is retrieved from there, the same for goods_attributes.
Output is
+--------+----------+----------------------------+
| guy_id | guy_name | right_attributes |
+--------+----------+----------------------------+
| 1 | Paolo | Fichissimo |
| 2 | Carlo | Fico |
| 3 | Ciccio | Ugly, Uglier |
| 4 | Bender | Ugliest |
| 5 | New kid | no attributes for this guy |
+--------+----------+----------------------------+
5 rows in set (0.02 sec)
It's explain plan is
+----+--------------------+-------+------+-----------------------+-----------------------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+-----------------------+-----------------------+---------+---------------+------+-------------+
| 1 | PRIMARY | g | ALL | NULL | NULL | NULL | NULL | 5 | |
| 5 | DEPENDENT SUBQUERY | a | ref | FK_goods_attributes_1 | FK_goods_attributes_1 | 4 | test.g.guy_id | 1 | Using where |
| 4 | DEPENDENT SUBQUERY | a | ref | FK_goods_attributes_1 | FK_goods_attributes_1 | 4 | test.g.guy_id | 1 | Using index |
| 3 | DEPENDENT SUBQUERY | b | ref | FK_bads_attributes_1 | FK_bads_attributes_1 | 4 | test.g.guy_id | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | b | ref | FK_bads_attributes_1 | FK_bads_attributes_1 | 4 | test.g.guy_id | 1 | Using index |
+----+--------------------+-------+------+-----------------------+-----------------------+---------+---------------+------+-------------+
5 rows in set (0.00 sec)
Well, it could be worse, note that I'm using InnoDB tables and I've declared foreing keys, whose indexes are picked up by the optimizer.
Let's a more SQLish version of this query
SELECT
g.guy_id,
g.guy_name,
COALESCE(
group_concat(b.attribute_name separator ', '),
group_concat(a.attribute_name separator ', ')
) right_attributes
FROM
guys g LEFT OUTER JOIN bads_attributes b
ON g.guy_id = b.guy_id
LEFT OUTER JOIN goods_attributes a
ON g.guy_id = a.guy_id
GROUP BY
g.guy_id,
g.guy_name
Output is the same
+--------+----------+---------------------------+
| guy_id | guy_name | right_attributes |
+--------+----------+---------------------------+
| 1 | Paolo | Fichissimo |
| 2 | Carlo | Fico |
| 3 | Ciccio | Ugly, Uglier |
| 4 | Bender | Ugliest |
| 5 | New kid | NULL |
+--------+----------+---------------------------+
5 rows in set (0.00 sec)
And the explain plan
+----+-------------+-------+------+-----------------------+-----------------------+---------+---------------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-----------------------+-----------------------+---------+---------------+------+----------------+
| 1 | SIMPLE | g | ALL | NULL | NULL | NULL | NULL | 5 | Using filesort |
| 1 | SIMPLE | b | ref | FK_bads_attributes_1 | FK_bads_attributes_1 | 4 | test.g.guy_id | 1 | |
| 1 | SIMPLE | a | ref | FK_goods_attributes_1 | FK_goods_attributes_1 | 4 | test.g.guy_id | 1 | |
+----+-------------+-------+------+-----------------------+-----------------------+---------+---------------+------+----------------+
3 rows in set (0.00 sec)
Well it looks certainly better, but there's a nasty filesort (this is a very small data set) which should be checked against a much larger dataset.
Doing the same on Firebird 2.1 beta 2 (which supports LIST() a function similar to MySQL's group_concat()) leads to:
SELECT
g.guy_id,
g.guy_name,
CASE
WHEN EXISTS (SELECT 0 FROM bads_attributes b WHERE b.guy_id = g.guy_id)
THEN (SELECT CAST(list(b.attribute_name) AS varchar(5000)) FROM bads_attributes b WHERE b.guy_id = g.guy_id
GROUP BY b.guy_id
)
WHEN EXISTS (SELECT 0 FROM goods_attributes a WHERE a.guy_id = g.guy_id)
THEN (SELECT CAST(list(a.attribute_name) AS varchar(5000)) FROM goods_attributes a WHERE a.guy_id = g.guy_id
GROUP BY a.guy_id
)
ELSE 'no attributes for this guy'
END right_attributes
FROM
guys g
and
SELECT
g.guy_id,
g.guy_name,
CAST(
COALESCE(
list(b.attribute_name),
list(a.attribute_name)
) AS varchar(5000)) right_attributes
FROM
guys g
LEFT OUTER JOIN bads_attributes b
ON g.guy_id = b.guy_id
LEFT OUTER JOIN goods_attributes a
ON g.guy_id = a.guy_id
GROUP BY
g.guy_id,
g.guy_name
Note that both queries need an explicit cast as list's results in Firebird are blobs.
The respective explain plans show that the set oriented one is better.
Prepare time: 00:00:00.
Field #01: GUYS.GUY_ID Alias:GUY_ID Type:INTEGER
Field #02: GUYS.GUY_NAME Alias:GUY_NAME Type:STRING(45)
Field #03: . Alias:RIGHT_ATTRIBUTES Type:STRING(5000)
PLAN (B INDEX (FK_BADS_ATTRIBUTES_1))
PLAN (B ORDER FK_BADS_ATTRIBUTES_1 INDEX (FK_BADS_ATTRIBUTES_1))
PLAN (A INDEX (FK_GOODS_ATTRIBUTES_1))
PLAN (A ORDER FK_GOODS_ATTRIBUTES_1 INDEX (FK_GOODS_ATTRIBUTES_1))
PLAN (G NATURAL)
Executing...
Done.
116 fetches, 0 marks, 0 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 21 index, 5 seq.
Delta memory: 54852 bytes.
Execute time: 00:00:00.
and
Prepare time: 00:00:00.
Field #01: GUYS.GUY_ID Alias:GUY_ID Type:INTEGER
Field #02: GUYS.GUY_NAME Alias:GUY_NAME Type:STRING(45)
Field #03: . Alias:RIGHT_ATTRIBUTES Type:STRING(5000)
PLAN JOIN (SORT (JOIN (G NATURAL, B INDEX (FK_BADS_ATTRIBUTES_1))), A INDEX (FK_GOODS_ATTRIBUTES_1))
Executing...
Done.
66 fetches, 0 marks, 0 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 9 index, 5 seq.
Delta memory: 38168 bytes.
Execute time: 00:00:00.