Cartesian – BigDataMark

This how-to post will provide a tutorial on adding a subtotal row above your results using SQL with a Cartesian join and window functions.

Here is our data:

product	num_sold
a	1
a	3
a	5
b	2
b	4
b	8

Here is the output that we want:

product	num_sold
total for a	9
a	1
a	3
a	5
total for b	14
b	2
b	4
b	8

One way to do an insert in SQL would be to use table generating functions, like those built into Hive. However, many SQL databases don’t have any of these functions built in. Amazon Redshift is unlikely to ever support explode() or any similar functions.

The natural way to insert rows though, is to use standard SQL and a Cartesian join. Although all joins in SQL are Cartesian joins, the term is often reserved for a join that utilizes a Cartesian product to get a set that is larger than any of the sets being joined.

Often, a Cartesian join is unintended, and can cause duplicate rows if the join predicate matches multiple rows because the key being joined on is not unique.

To do a Cartesian join to add rows above your results, you would create a helper table like this:

uniq_id	row_count
0	1
1	1

Then use SQL like this:

SELECT CASE
         WHEN b.uniq_id = 0 THEN 'total for ' ||a.product
         ELSE a.product
       END AS product,
       CASE
         WHEN b.uniq_id = 0 THEN a.total
         ELSE a.num_sold
       END AS num_sold
FROM (SELECT *,
             SUM(1) OVER (PARTITION BY product ORDER BY product ROWS UNBOUNDED PRECEDING) AS row_count,
             SUM(num_sold) OVER (PARTITION BY product ORDER BY product ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS total
      FROM product_sold_table) a
  LEFT JOIN helper_table b ON (a.row_count = b.row_count)
ORDER BY a.product,
         COALESCE(b.uniq_id,1),
         a.num_sold

Tag: Cartesian

SQL – Inserting a subtotal row above results

Share this: