Visualize and analyze clickstream data with D3.js and Dagre

Do you need to do clickstream analysis? Use this free tool to visualize clickstream data. Perform customer journey analysis and find the “happy path”.

When tasked to help the UX team find where people were falling out of the “funnel”, I devised a way to build an analysis pipeline and while I was at it, visualize the clickstream data.

Preparing the clickstream data

Using Adobe Analytics as the raw data source, I processed the data using Python/pandas and identified the paths being taken through the web site as well as the frequency of those paths.

To simplify the data for presentation, I removed (but logged) when users backtracked and revisited a page that they had already visited. The user’s journey was actually like this:

home > products > product 1 > home > products > product 2

However, I output their path as:

home > products > product 1 | products > product 2

While I could shorten this by truncating all the pages that they had already visited, I use the pipe symbol so that I don’t lose the data on where the user is coming from when they start visiting new pages again. Additionally, the pipe stands out to indicate that the page before the pipe caused the user to backtrack, which could indicate an issue.

I also removed single-page visits (bounces) but logged them.

Finally, I output several lists:

  • Page name where the user “bounced” and frequency of bounces from that page.
  • Page name where the user backtracked and frequency of changes in direction from that page.
  • Page name where the user exited and frequency of exits from that page.

I added metadata to these, including the platform.

mobile, home > products > product 1 | products > product 2, 70

You can see an example of the results file here.

Here’s what the clickstream file looks like when analyzed in an Excel PivotTable:

Excel PivotTable to analyze clickstreams
Excel PivotTable to analyze clickstreams

The excel formula to populate the “entrypage” column in Excel from the “clickstream” column:

=LEFT(C2,SEARCH(">",C2,1)-1)

If anyone is interested in the Python/pandas script, please let me know and I can share it.

Visualizing the clickstream data

I didn’t want to give up here though, I wanted to be able to view these unique clickstreams in a visual way and combined, so I built this tool using D3.js and Dagre.

Visualize clickstream data for analysis using D3.js and Dagre
Visualize clickstream data for analysis using D3.js and Dagre

https://www.bigdatamark.com/clickstream_analysis.html

If you’re not familiar with Dagre, it’s the best graphing library for complex graphs with circular links between nodes. It’s freely available under the MIT license.

The general skeleton for Dagre comes from Gansner, et al., “A Technique for Drawing Directed Graphs”, which gives both an excellent high level overview of the phases involved in layered drawing as well as diving into the details and problems of each of the phases. Besides the basic skeleton, we specifically used the technique described in the paper to produce an acyclic graph, and we use the network simplex algorithm for ranking. If there is one paper to start with when learning about layered graph drawing, this is it!

For crossing minimization we used Jünger and Mutzel, “2-Layer Straightline Crossing Minimization”, which provides a comparison of the performance of various heuristics and exact algorithms for crossing minimization.

For counting the number of edge crossings between two layers we use the O(|E| log |V_small|) algorithm described in Barth, et al., “Simple and Efficient Bilayer Cross Counting”.

For positioning (or coordinate assignment), we derived our algorithm from Brandes and Köpf, “Fast and Simple Horizontal Coordinate Assignment”. We made some adjustments to get tighter graphs when node and edges sizes vary greatly.

The implementation for clustering derives extensively from Sander, “Layout of Compound Directed Graphs.” It is an excellent paper that details the impact of clustering on all phases of layout and also covers many of the associated problems. Crossing reduction with clustered graphs derives from two papers by Michael Forster, “Applying Crossing Reduction Strategies to Layered Compound Graphs” and “A Fast and Simple Heuristic for Constrained Two-Level Crossing Reduction.”

I hope you’ll enjoy using this free tool to visualize clickstream data, map your customer journey, and find the “happy path”. Please let me know if you have any questions. Thanks!

Leave a Reply

Your email address will not be published.