Tag Archives: Cartesian

SQL – Inserting a subtotal row above results

sysadminThis how-to post will provide a tutorial on adding a subtotal row above your results using SQL with a Cartesian join and window functions.

Here is our data:

productnum_sold
a1
a3
a5
b2
b4
b8

Here is the output that we want:

productnum_sold
total for a9
a1
a3
a5
total for b14
b2
b4
b8

One way to do an insert in SQL would be to use table generating functions, like those built into Hive. However, many SQL databases don’t have any of these functions built in. Amazon Redshift is unlikely to ever support explode() or any similar functions.

The natural way to insert rows though, is to use standard SQL and a Cartesian join. Although all joins in SQL are Cartesian joins, the term is often reserved for a join that utilizes a Cartesian product to get a set that is larger than any of the sets being joined.

Often, a Cartesian join is unintended, and can cause duplicate rows if the join predicate matches multiple rows because the key being joined on is not unique.

To do a Cartesian join to add rows above your results, you would create a helper table like this:

uniq_idrow_count
01
11

Then use SQL like this:

SELECT CASE
         WHEN b.uniq_id = 0 THEN 'total for ' ||a.product
         ELSE a.product
       END AS product,
       CASE
         WHEN b.uniq_id = 0 THEN a.total
         ELSE a.num_sold
       END AS num_sold
FROM (SELECT *,
             SUM(1) OVER (PARTITION BY product ORDER BY product ROWS UNBOUNDED PRECEDING) AS row_count,
             SUM(num_sold) OVER (PARTITION BY product ORDER BY product ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS total
      FROM product_sold_table) a
  LEFT JOIN helper_table b ON (a.row_count = b.row_count)
ORDER BY a.product,
         COALESCE(b.uniq_id,1),
         a.num_sold