Count distinct window function in Redshift

bigdatamarkRedshift has a count() window function, but it doesn’t support counting distinct items. However, one can still count distinct items in a window by using another method.

The following table represents records with a measure of various items per day:

dateitemmeasure
2015-01-01table12
2015-01-01chair51
2015-01-01lamp8
2015-01-02table17
2015-01-02chair72
2015-01-02lamp23
2015-01-02bed1
2015-01-02dresser2
2015-01-03bed1

To get all of the data plus the number of distinct items measured per day, we would want to use this window function:

SELECT *
, COUNT(DISTINCT item) OVER(PARTITION BY DATE) AS distinct_count
FROM mytable;

However, this doesn’t work, because as of the time I’m writing this article, the Redshift documentation says “ALL is the default. DISTINCT is not supported.”

Instead, one will need to use the DENSE_RANK() function to get this count, with the item to be counted in the ORDER BY window clause.

SELECT *
, DENSE_RANK() OVER(PARTITION BY date ORDER BY item) AS distinct_count
FROM mytable;
dateitemmeasuredistinct_count
2015-01-01table123
2015-01-01chair513
2015-01-01lamp83
2015-01-02table175
2015-01-02chair725
2015-01-02lamp235
2015-01-02bed15
2015-01-02dresser25
2015-01-03bed11

Update: I’m told that this no longer works. It must have been an undocumented feature (bug). You now need to do another query on top of this result set.

2 Replies to “Count distinct window function in Redshift”

  1. Nice workaround. However I think you could use a better example by having duplicate items on a particular date. With the current example, regular count(*) window function would work as well.

Leave a Reply

Your email address will not be published. Required fields are marked *