so many trails ... so little time: Of correlated and uncorrelated subqueries

Thursday, November 02, 2006

Of correlated and uncorrelated subqueries

Many SQL questions on Devshed forums are easily solved using correlated and uncorrelated subqueries, but many don't know of them at all, or at least are not shure about the syntax, usage and performance implications, they sometimes rave about a stored procedure to loop and select other values ...
The definition (taken from O'Reilly's "Learning SQL on SQL Server 2005") of a correlated subquery:
A correlated subquery is an inner subquery whose information is referenced by the main, outer query such that the inner query may be thought of as being executed repeatedly.
And the definition of an uncorrelated subquery:
A noncorrelated subquery is a subquery that is independent of the outer query. In other words, the subquery could be executed on its own.
This is all fine and dandy, but what's the use?
I'll put up two simple examples of usage, which should make clear which kind of questions can be answered with (un)correlated subqueries:

Example 1: get the number of "open" orders, the number of "waiting" orders and the open vs. waiting ratio

SELECT 
COUNT(*) AS o_open, 
(SELECT COUNT(*) FROM sales WHERE order_status = 'waiting') AS o_waiting,
(COUNT(*))/ 
(SELECT COUNT(*) FROM sales WHERE order_status = 'waiting') AS open_waiting_ratio
FROM SALES WHERE order_status = 'open';

You can see that the queries are uncorrelated, and they better be, as the where conditions are different.

Example 2: we want to see the customer number, the amount of the customer's last order and the total amount ordered by each customer

SELECT 
s.cust_no,
s.total_value AS order_value,
(SELECT SUM(i_s.total_value) FROM sales i_s WHERE i_s.cust_no = s.cust_no) total_ordered
FROM sales s
WHERE
s.order_date =
(SELECT MAX(ii_s.order_date) FROM sales ii_s WHERE ii_s.cust_no = s.cust_no);

You can see that both subqueries are correlated, the first one gets total amount ordered for each customer, the second one gets the last order date (field is datetime) and helps determine the last order in the main query.

This should make you aware that some kind of set based logic is often disguised as procedural and usually leads to ugly cursor usage in stored procedures instead of direct queries by the inexperienced.

Another example of a correlated subquery to display customers and one randomly selected purchase order for each of them:

SELECT 
 a.CUST_NO, 
 a.CUSTOMER,
 b.PO_NUMBER
FROM 
 CUSTOMER a
  INNER JOIN
   sales b
    ON a.CUST_NO = b.CUST_NO
WHERE 
 b.PO_NUMBER = 
(
  SELECT 
   first 1 c.PO_NUMBER 
  FROM 
   sales c 
  WHERE 
   c.CUST_NO = b.CUST_NO 
  ORDER BY 
   rand()
)

Both examples are run against Firebird's standard EMPLOYEE.FDB.

Correlated subqueries are also useful to specify updates and deletes on a table conditioned to values of another table (instead of joining them) see this example:

UPDATE 
 stops s
SET 
 s.inservice=1 
WHERE EXISTS
(
 SELECT
  ls.stopid
 FROM 
  linestop ls
 WHERE
  s.stopid=ls.stopid 
 AND 
  ls.signid=79
);

so many trails ... so little time

Thursday, November 02, 2006

Of correlated and uncorrelated subqueries

No comments:

About Me

Blog Archive

Links

Labels

Followers