Monday, June 23, 2008

Don't silently turn your outer joins into inner joins ...

Surprised by outer joins not returning the expected results? Looks like they behave like an inner join?

You are probably messing with the join and the where clause.

That's also why it's important to use the full join notation for every join instead of the where based one (often used for inner joins) as it allows for a better separation of the join and the where clause.

Remember that the where clause is applied after the join clause, so if you want to restrict results from the table on the "right" of the left outer join but still have the NULLs you'll have to put the restriction in the join condition, putting it in the where clause will turn your outer join into an inner join.

Not convinced? Let's see an example:

I'll query two tables of the Firebird sample database, EMPLOYEE and EMPLOYEE_PROJECT, I'm looking for a list of employees assigned to a specific project and those not assigned (probably I'll pick a few of them and assign those too), let's see the first query

  1. SELECT
  2. e.EMP_NO,
  3. e.FULL_NAME,
  4. ep.PROJ_ID
  5. FROM
  6. EMPLOYEE e LEFT OUTER JOIN EMPLOYEE_PROJECT ep
  7. ON e.EMP_NO = ep.EMP_NO;



and it's output is

2 Nelson, Robert [null]
4 Young, Bruce VBASE
4 Young, Bruce MAPDB
5 Lambert, Kim [null]
8 Johnson, Leslie VBASE
8 Johnson, Leslie GUIDE
8 Johnson, Leslie MKTPR
9 Forest, Phil [null]
11 Weston, K. J. [null]
12 Lee, Terri MKTPR
14 Hall, Stewart MKTPR

So far so good, now I want only those on the VBASE project and the unassigned ones, let's do what instinct suggests, add a where clause, the outer join will take care of the rest (...)

  1. SELECT
  2. e.EMP_NO,
  3. e.FULL_NAME,
  4. ep.PROJ_ID
  5. FROM EMPLOYEE e LEFT OUTER JOIN EMPLOYEE_PROJECT ep
  6. ON e.EMP_NO = ep.EMP_NO
  7. WHERE
  8. ep.PROJ_ID = 'VBASE'


Result is:

4 Young, Bruce VBASE
8 Johnson, Leslie VBASE
15 Young, Katherine VBASE
44 Phong, Leslie VBASE
45 Ramanathan, Ashok VBASE
71 Burbank, Jennifer M. VBASE
83 Bishop, Dana VBASE
136 Johnson, Scott VBASE
138 Green, T.J. VBASE
145 Guckenheimer, Mark VBASE

Eeek, where are all the unassigned gone???
That's because the WHERE is applied after the JOIN and so it throws away all those NULLs, infact it's bound to pick only PROJ_ID = 'VBASE'
Now the right one, we want it executed in the join

  1. SELECT
  2. e.EMP_NO,
  3. e.FULL_NAME,
  4. ep.PROJ_ID
  5. FROM EMPLOYEE e LEFT OUTER JOIN EMPLOYEE_PROJECT ep
  6. ON e.EMP_NO = ep.EMP_NO
  7. AND
  8. ep.PROJ_ID = 'VBASE'



And the result is just what we are looking for:

2 Nelson, Robert [null]
4 Young, Bruce VBASE
5 Lambert, Kim [null]
8 Johnson, Leslie VBASE
9 Forest, Phil [null]
11 Weston, K. J. [null]
12 Lee, Terri [null]
14 Hall, Stewart [null]
15 Young, Katherine VBASE
20 Papadopoulos, Chris [null]
24 Fisher, Pete [null]
28 Bennet, Ann [null]
29 De Souza, Roger [null]

2 comments:

Unknown said...

"So far so good, now I want only those on the VBASE project and the unassigned ones".
This means the where clause perhaps is some like "ep.PROJ_ID = 'VBASE' or ep.PROJ_ID is null", I think...

pabloj said...

Yes, but the point of this post is to raise awareness on the side effects of a "quick and dirty" where clause