Monthly Archives: July 2007

Groupwise maximum

Recently, I’ve been working on a new project, an update to the Magdalen College JCR Rooms Database. Doing this whilst on holiday with no internet connection certainly made me appreciate the reference panel built into Dreamweaver. However, it didn’t help me with the following problem: to simplify slightly, suppose you have a database table like so:

+----+---------+---------+
| id | version | name    |
+----+---------+---------+
|  0 |       1 | apple   |
|  0 |       2 | apple2  |
|  1 |       1 | orange  |
|  1 |       2 | orange2 |
+----+---------+---------+

Let’s call it fruit. The primary key is the composition of the id and version fields, each group of rows with the same id being different versions of the same object’s details. Possibly not the best idea in the world, in retrospect, but it was too late now.

What I wanted to do was get the row with maximum revision within each id, i.e. a single SQL query which would return

+----+---------+
| id | name    |
+----+---------+
|  0 | apple2  |
|  1 | orange2 |
+----+---------+

Getting the maximum version number within each id is, of course, as simple as

SELECT id, MAX(revision) from fruit GROUP BY id

.
Then you could do a second query like

SELECT * FROM fruit WHERE id = '$id' AND version='$ver'

to get the rest of each row, plugging in the maximum version from the first query and working through the rows one at a time…which is fine, except that there are 250 of them in the real thing, and that’s only going to increase.

Methods involving multiple separate queries were out. Quite apart from the performance issues, I wanted a single query to allow me to search through the latest revisions of all rows by adding to the WHERE clause.

After much further head-scratching, I came up with

SELECT * FROM fruit GROUP BY id HAVING version=MAX(version)

…which returned precisely the square root of stuff all. Initial investigations when back at an internet connection led me to blame this on the rather ancient version of MySQL doing service as development SQL server on my laptop; however, upgrading to the latest stable version didn’t help.

What did help, though – and at last we reach the point of this rather large post – were these pages, which pointed me in the direction of this one, and told me everything I ever wanted to know about the “groupwise maximum” problem, as I discovered it was called. The solution I’ll be using, paraphrased from the examples on that last page, is

SELECT fruit1.id, fruit1.name from fruit AS fruit1,
(SELECT id, MAX(version) AS ver FROM fruit GROUP BY id) AS fruit2
WHERE fruit1.id=fruit2.id
AND fruit1.version = fruit2.ver;

The bracketed expression is a sub-query, which internally creates a temporary table to match the maximum versions from. Searching through the results is just a case of adding AND fruit1.field = ‘value’ to the outer WHERE clause.

All of which is quite enough SQL for one day, and leaves me realising I still have much to learn about it.