Windowing functions – Who Corrupted SQL?

I hate writing sub-queries, but I seem to hate windowing functions even more! Take the following:

select
PR.ProfileName,
(select max(Created) from Photos P where P.ProfileID = PR.ID) as LastPhotoDate
from Profiles PR

In this example, I want to list all Profile names, and also include a statistic of the most recent uploaded photo. It’s quite easy and looks a little bloated, but compared to windowing functions, it is slower. Let’s have a look at the more performant alternative:

select
PR.ProfileName,
max(Created) OVER (PARTITION BY PR.ID) as LastPhotoDate
from Profiles PR
join Photos P
on P.ProfileID = PR.ID

That’s actually quite clear (if you are used to using windowing functions) and performs better. But it’s still not ideal, coders now need to learn about OVER and PARTITION just to do something seemingly trivial. SQL has let us down. It looks like someone who creates RDBMS’s told the SQL comittee to add windowing functions to the SQL standard, it’s not user friendly at all, computers are supposed to do the hard work for us!

It should look like this:

select
PR.ProfileName,
max(Created)
from Photos P
join Profiles PR
on PR.ID= P.ProfileID
Group By PR.ID --or Group By PR.ProfileName

I don’t see any reason why an RDBMS cannot make this work. I know that if a person gave me this instruction and I had a database, I would have no trouble. Of course, if different partitioning is required within the query, then there is the option for windowing functions, but for the stock standard challenges, keep the SQL simple!

Now what happens when you get a more difficult situation? What if you want to return the most recently uploaded photo (or at least the ID of the photo)?

--Get each profiles' most recent photo
select
PR.ProfileName,
P.PhotoFileName,
P.PhotoBlob
from Photos P
join Profiles PR
on PR.ID= P.ProfileID
join (
select ProfileID, max(Created) as Created
from Photos
group by ProfileID
) X
on X.ProfileID = P.ProfileID
and X.Created = P.Created

It works but it’s awkward and has potential for performance problems. From my limited experience with windowing functions and short search on the web, I couldn’t find a windowing function solution. But again, there’s no reason an RDBMS can’t make it easy for us, and again the SQL language should make it easy for us!

Why can’t the SQL standards group innovate? Something like this:

select
ProfileName,
P.PhotoFileName,
P.PhotoBlob
from Photos P
join Profiles PR
on PR.ID= P.ProfileID
group by ProfileID
being max(Created) --reads: being a record which has the maximum created field value

And leave it to the RDBMS to decide how to make it work? In procedural coding with a set, while you are searching for a maximum value you can also store the entity which has that maximum. There’s no reason this can’t work.

It seems the limitation is the SQL standardisation body. I guess someone could always implement a work around, create a plugin for opensource SQL query tools, as well as opensource functions to convert SQL+ [with such abilities as introduced above] to SQL.

(By the way I have by no means completely thought out all the above, but I hope it describes the spirit of my frustrations and of the possible solution – I hope some RDBMS experts can comment on this dilemma)

Leave a Reply

Your email address will not be published. Required fields are marked *