The Friday Fragment

Published on 24 April 2010
This post thumbnail

It's Friday, and time again for a new Fragment: my weekly programming-related puzzle.

This Week's Fragment

Owen Moore at the IRS needs our help again. He wants to add mailing addresses to his "pink slip pick list" report. Easy enough: he just added in the EmployeeAddress table to get these fields. But when he ran it, he found that it dropped some of the employees. He doesn't mind having multiple rows for one employee (whenever there are multiple addresses), but the IRS never wants to overlook a taxpayer.

Upon investigation, he discovered that the dropped ones didn't have addresses on file. Here is his SQL:

select companyid, e.employeeid, hiredate, address
from employee e, employeeaddress ea
where e.employeeid = ea.employeeid
and ...
order by companyid, hiredate

Can you help Owen correct his mistake?  The "..." is the rest of his where clause (see solution below). Since it's not important to the solution, it's omitted here.

If you want to “play along”, post the solution as a comment or send it via email. You can use the attached file for sample data. To avoid “spoilers”, simply don’t expand comments for this post. Owen promises to add you to the "do not audit" list if you can help.

Last Week's Fragment - Solution

Last week's fragment was missing from a SQL statement. IRS programmer Owen Moore needed a report of all but the 49 most senior employees of each company. That is, fill in the dot, dot, dots (ellipsis) here:

select  companyid, employeeid, hiredate
from employee e
where ...
order by companyid, hiredate

Fortunately, Owen had a couple strokes of luck. First, during his lunch break, the fast food drive-through attendant asked, "would you like a correlated subquery with that?"  Not knowing what such a thing was, he said "yes", and it turned out to be just the thing he needed. Second, upon arriving back at work, he was greeted with a couple of emails (including one from Spencer) suggesting a SQL like the following:

select companyid, employeeid, hiredate
from employee e
where 49 >
  (select count(*) from employee ei
   where ei.companyid = e.companyid and
   ei.hiredate > e.hiredate)
order by companyid, hiredate

That got him in the ballpark, but, alas, there was a gap:  it revealed the 49 newest employees (a good thing), but not all but the 49 oldest. Well, Owen pulled up Google Translate and found that "all but" in English loosely translates to "not exists" in SQL. So he wrapped an outer select around it, flipped the greater than sign (to exclude older ones), and came up with the following:

select companyid, employeeid, hiredate
from employee e
where not exists
  (select *
   from employee ei
   where ei.companyid = e.companyid
   and ei.employeeid = e.employeeid
   and 49 >
     (select count(*) from employee eii
      where eii.companyid = ei.companyid and
      eii.hiredate < ei.hiredate))

By the way, the "i" suffixes on table aliases mean "inner" (so "ei" is "employee inner" and "eii" is "employee inner-inner"), just a convention.

Owen has a "make it right, then make it fast" mentality, so he'll consider tuning/rewriting later if performance is bad. But if you're interested in tuning it, he attached a script to create the table, load some sample data, and run the SQL. This script also has SQLs and data to work on this week's fragment.