Thursday, November 02, 2006

Of correlated and uncorrelated subqueries

Many SQL questions on Devshed forums are easily solved using correlated and uncorrelated subqueries, but many don't know of them at all, or at least are not shure about the syntax, usage and performance implications, they sometimes rave about a stored procedure to loop and select other values ...
The definition (taken from O'Reilly's "Learning SQL on SQL Server 2005") of a correlated subquery:
A correlated subquery is an inner subquery whose information is referenced by the main, outer query such that the inner query may be thought of as being executed repeatedly.
And the definition of an uncorrelated subquery:
A noncorrelated subquery is a subquery that is independent of the outer query. In other words, the subquery could be executed on its own.
This is all fine and dandy, but what's the use?
I'll put up two simple examples of usage, which should make clear which kind of questions can be answered with (un)correlated subqueries:

Example 1: get the number of "open" orders, the number of "waiting" orders and the open vs. waiting ratio

  1. SELECT
  2. COUNT(*) AS o_open,
  3. (SELECT COUNT(*) FROM sales WHERE order_status = 'waiting') AS o_waiting,
  4. (COUNT(*))/
  5. (SELECT COUNT(*) FROM sales WHERE order_status = 'waiting') AS open_waiting_ratio
  6. FROM SALES WHERE order_status = 'open';

You can see that the queries are uncorrelated, and they better be, as the where conditions are different.

Example 2: we want to see the customer number, the amount of the customer's last order and the total amount ordered by each customer

  1. SELECT
  2. s.cust_no,
  3. s.total_value AS order_value,
  4. (SELECT SUM(i_s.total_value) FROM sales i_s WHERE i_s.cust_no = s.cust_no) total_ordered
  5. FROM sales s
  6. WHERE
  7. s.order_date =
  8. (SELECT MAX(ii_s.order_date) FROM sales ii_s WHERE ii_s.cust_no = s.cust_no);

You can see that both subqueries are correlated, the first one gets total amount ordered for each customer, the second one gets the last order date (field is datetime) and helps determine the last order in the main query.

This should make you aware that some kind of set based logic is often disguised as procedural and usually leads to ugly cursor usage in stored procedures instead of direct queries by the inexperienced.

Another example of a correlated subquery to display customers and one randomly selected purchase order for each of them:

  1. SELECT
  2. a.CUST_NO,
  3. a.CUSTOMER,
  4. b.PO_NUMBER
  5. FROM
  6. CUSTOMER a
  7. INNER JOIN
  8. sales b
  9. ON a.CUST_NO = b.CUST_NO
  10. WHERE
  11. b.PO_NUMBER =
  12. (
  13. SELECT
  14. first 1 c.PO_NUMBER
  15. FROM
  16. sales c
  17. WHERE
  18. c.CUST_NO = b.CUST_NO
  19. ORDER BY
  20. rand()
  21. )


Both examples are run against Firebird's standard EMPLOYEE.FDB.


Correlated subqueries are also useful to specify updates and deletes on a table conditioned to values of another table (instead of joining them) see this example:

  1. UPDATE
  2. stops s
  3. SET
  4. s.inservice=1
  5. WHERE EXISTS
  6. (
  7. SELECT
  8. ls.stopid
  9. FROM
  10. linestop ls
  11. WHERE
  12. s.stopid=ls.stopid
  13. AND
  14. ls.signid=79
  15. );

No comments: