How we measure success with partial data: Taming the Wild West of product analytics at Atlassian

Co-authored by Michael Ebner and Nina Kumagai

The ‘Wild West’ days of Atlassian analytics; you can practically see the saloon and rolling tumbleweed. Ok, maybe not. But it’s just so hard to get real 1800’s era photos of software developers coding in a Wild West saloon. (Atlassians meet in San Francisco, 2019.)

In late 2016, we shipped a game-changing improvement for Confluence. Collaborative editing. Two years of sweat and tears went into this highly anticipated new feature that would enable multiple users to edit pages simultaneously. We can still feel the mix of excitement and anxiety in our Sydney office the day we ‘pushed the button’ and made this version available for download. Done! Shipped! A huge success!

Hold on. There was a small catch. We didn’t actually know whether it was a huge success. We didn’t know who was using it. Because we didn’t actually know who could use it.

Like most early-stage companies, 2016 Atlassian was resource-constrained and so everyone was expected to fill many shoes, from development, to design, to analytics and insights. What we lacked in formal processes, we made up for in gusto and drive. The measurement plan for collaborative editing, for example, included roughly 20 metrics! To chart these metrics we engineered a bunch of analytic ‘events’ — small packets of data that are triggered by a user action and subsequently sent to us through a data pipeline, eventually ending up in our centralized data lake for analysis. In the lead up to our release date we had focused heavily on the usage aspects of collaborative editing but overlooked a crucial step: whether the customer actually turned it on.

Atlassian analytics are always anonymous and generic in nature. We never collect what’s on a Confluence page nor Jira ticket. We do not collect any names, nor information about a users’ location. Essentially we know, for example, when someone creates a ticket, or when someone edits a Confluence page, but nothing about the person or the page itself. These types of analytics allow us to design the products better, as we get an understanding of how our customers actually use them.

Whilst the team put on a brave face, Rex was clearly disappointed when he found out we couldn’t tell which customers were using Collaborative Editing. He was insufferable for about a week.

The fix, unfortunately, was a slow process. Collaborative editing was shipped in our Server and Data Center products (we call them ‘Behind the Firewall’ deployment options). In the ‘Behind the Firewall’ world, we can only add ‘new analytics’ to newly released versions. We can’t go back in time and retroactively add analytics to a customer’s Confluence site that they have already downloaded and set up. So, we added new analytics to the next version we released and had to wait for our customers to upgrade their Confluence sites. That process takes a lot of time, sometimes years.

The above example highlights one of the major challenges we had in the early days of Atlassian’s rapid growth. But, in a way, it acted as a wake-up call. It forced us to tame the wild west of analytics and add some process.

As a rapidly growing company, we realized we had to figure out a way to:

  1. capture strong success measures that cover the essential aspects of a feature — including, as you can imagine, which customers were turning it on,
  2. align our development teams on the required analytics events to compute the above measures, and
  3. close the loop and iterate by addressing whether or not we met our target goals from (1) and that we were actually addressing the needs of the customer with a feature.

In 2016, the authors were new to the company. After collaborative editing shipped — and we learned our lesson — we implemented the Atlassian Goals, Signals and Measure Play across our entire Behind the Firewall analytics team.

Originally conceived by Google, The GSM Play is a thought exercise that helps teams align on the goal and define metrics to measure. In the example at the start of this article, the teams focused on measuring output. With the introduction of GSM, we created a way to track progress towards a goal and distinguish the signals from the noise whilst doing so.

GOAL encapsulates the problem and how we are going to solve it. It’s high level, often not measurable, but is the overall north star when asking ourselves — “Why are we building this again?”

SIGNALS represent the behavior that we expect from users once they are exposed to the feature. A goal can have multiple signals. They’re still not usually measurable.

MEASURES are used to, well, measure. Does the feature live up to our expectations? Measures are numerical, measurable, and can be charted and examined.

In a nutshell, Goals help teams to align on the feature’s goal. Once a Goal has been set, we think about how the success or failure of this goal would be Signaled in the customer’s behavior or attitude. We then proceed to decide on how these Signals can be converted into specific numerical Measures.

As Data Analysts, this diagram literally took us 3 weeks to design. Admire it. HMU for any design tips and tricks.

At Atlassian, not only do we have many products, we also have a few deployment options for those products. While most of our customers choose Cloud, many of our large and complex customers choose to deploy on Data Center. Think — ‘The Fortune 500’, Governments, and the world’s banks and insurance houses. New features are designed to solve these customers’ needs. But what are the needs of these particular customers and, to the point of this article, how do we know if our new features are meeting those needs?

Shipping Content Delivery Networks (CDN) answered the requests of our large, distributed customers with a much-needed new feature and gave the analytics team an opportunity to flex their measurement muscles.

CDN is an integration that allows static content from a customer’s Confluence, Jira, and Bitbucket sites to be stored on dedicated servers physically closer to where a user is accessing the site, speeding up the delivery of the content to the user.

Overall success for these types of opt-in features isn’t obvious. It’s not an actively-used feature, meaning Monthly Active Users (MAU) is out, nor does Adoption alone capture the entire picture of consistent usage. So how did we do it?

Any challenge can be solved with a whiteboard and a little ❤️.

Before starting work on a new feature, before a single line of code is written, Product Managers, Engineering Leads, and Analytics assemble to align on project goals. It’s important to call out here that the product teams themselves align on what they would consider the goal/s of the project to be, with the help of Analytics. Not the other way around.

We generally believe that 1–3 goals are appropriate for a feature.

This is a development team driven exercise. It is up to the team to decide on why they are building the thing they are building. Analysts are there to guide the discussion and make sure that goals makes sense and align with overall strategy.

We usually set aside 60 mins for our GSM Plays with the entire team. But then, at Atlassian, the team’s Product Manager will continue to develop the ideas from the session into a coherent document on a Confluence page. This can take several days.

For our Content Delivery Network (CDN) Feature, the Goal was to increase performance across our customers’ Data Center sites. Simple, yes. But not exactly measurable. It was our North Star and the reason we invested in building CDN integration into our products, but what does it actually mean?

For us we could examine our Goal using three different Signals:

  1. Site Admins from our customers largest and most complex Confluence and Jira sites understood the value proposition of CDN and the feature was easy to enable in our products
  2. The feature worked and performance improvements were made, and
  3. Customers were happy with the feature, and the value it provided.

Again, not exactly very measurable.

But here’s where the value is. The above three Signals provide an excellent framework to construct some real numerical Measures. We usually implement several Measures for a feature, sometimes dozens, that all address the agreed-upon Signals.

We knew that if our customers understood the value proposition of the feature then, well, they’d turn it on. It’s an Adoption metric, but with the caveat that we are specifically measuring success against a subset of our customers — the largest. So we set a Measure that examined those customers that enabled CDN as a function of those customers with large licenses and many users.

I honestly struggled to find an image that conveyed “dedicated servers physically closer to where a user is accessing a site, speeding up the delivery of the content to that user across large distances”. Then I saw this thing sitting in our library. What a win!

For CDN we crafted a metric, and a success target, that essentially said we wanted to get X % of customers, with at least Y users, enabling CDN on their instances. Super simple. But already we have a framework that our development team can use to make sure the right analytic events get implemented and we avoid the mistakes highlighted at the start of this article. The product team now needs to think about the different ways CDN can be enabled. They need to think about whether to implement an event when an admin clicks ‘enable’; or do we implement and event when we know CDN is working? Do we need to implement a new event to tell us how many users the site has? These are all questions a developer needs to consider when implementing the right analytics for a feature.

Similarly, for Signal Two we concluded that performance is perceived by end-users as a function of the latency from client → server → client plus the time taken for page elements to render and become usable. We call it ‘Transport and Latency Times’. We said we’d be happy if “the 90th Percentile of Server and Latency times for customers with Y users was reduced by X % where CDN is enabled”.

We also wanted the feature to provide noticeable value to the customer. We didn’t want them turning it on, finding little use for it, and subsequently switching it off — Feature Churn. We, therefore, measured the rate at which our customers enable, but then disable the feature. It’s a Retention Measure.

Everyone is just so happy now that the GSM Play has been rolled out across Atlassian. Except Dan. Dan looks really sad thinking about all of the ways he could have used GSM before now.

The examples above are really only a subset of the Measures we documented prior to shipping our CDN integration. But already you can see that with only these three measures our product teams would need to agree on, and implement, several analytic events that they might not have thought about prior to the GSM Play.

This is perhaps one of the biggest strengths of the GSMs. It takes the guesswork out of a product team’s roadmap and provides strong guidance on what is expected from the analytics events they are implementing in our products.

A really valuable exercise we do with our stakeholders is to get them to write down (in the GSM documentation) the exact way we would construct a SQL query that examined their measures. What filters would we need? If the product team aligns on a success measure, and subsequently implements some analytic events, we ask them, “how do we make the chart you want to see using the analytics you have implemented in this GSM”.

Change can be hard, and implementing this across the organization took some time.

We had two things in mind when we sought to introduce this new Play to our development process;

  1. Workload: GSMs not only shed light on the quality of the output delivered by product teams, but they also add to the existing workload. Hence, we knew that we had to design the GSM Play as lean as possible and make sure that it integrates seamlessly into existing workflows.
  2. Timings: It’s important to align on success metrics as early as possible in the process of developing a feature. That ensures that every team member is on the same page when it comes to the needle we want to move.

We took the above into account when designing the first prototype of the GSM process and added resources, such as in-depth documentation and templates. Further, we sparred the process with central members of the Jira, Confluence, and Bitbucket product teams. Then we started a trial period for all three big products simultaneously. We had some good learnings and fine tuned our process prior to rolling it out to all products.

The GSM Play itself can be found as a template within the Atlassian Playbook. If you use Confluence Cloud, the GSM template can be found when creating a new page.

Lessons learnt

Not everything went smoothly, and retrospectively we would have done a couple of things differently.

  1. Prototyping with three big product teams simultaneously is a lot of work and things moved slowly. In hindsight we’d probably have prototyped it with only one product team first. This would have also allowed us to share early wins, and challenges, with the other teams.
  2. Measuring success and failure is a sensitive topic. Sometimes things don’t work out as expected, and you need to be aware of your teammates. Teams could have spent months to years on a feature. GSMs are all about receiving feedback and acting accordingly, but there can be a real danger if negative results impact a team’s morale instead of initiating positive change. Hence, we needed to make it very clear that negative results are not the end of the world, but should rather trigger teams to go back and iterate.
  3. Allowing teams to tweak the GSM process according to their needs was another big step that helped to make this initiative successful. In an agile working environment, it’s crucial to give teams the flexibility they need to be successful. Hence, we ask our teams to measure the impact of their actions and offer the GSM as a framework that they can tweak in any way they want. If they find a better way to measure their success they can do that. Just for the record, so far no team has dropped out.

Learn more about Data at Atlassian

Stay in touch with the Analytics & Data Science team and learn more about careers at Atlassian here.

About the Authors

Dr. Luke Hedge helps product teams develop awesome software using data. He also attempts to make listenable music. He does one better than the other. He gained a PhD in Biology from the University of New South Wales in 2013, before turning is attention to technology and data science, joining Atlassian as a Senior Product Analyst in 2016.

Nina Kumagai is an Associate Product Analyst specialising in Behind the Firewall deployment options at Atlassian. Nina loves data but she also loves eating ramen out every weekend.

Michael Ebner is a Principal Product Analyst at Atlassian and is based in Sydney, Australia.

Luke helps product teams develop awesome software using data. He also attempts to make listenable music. He does one better than the other.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store