How to measure user satisfaction with Apdex metric

When working in a microservices environment we’ll end up having lots of different applications running in production, each of them handling productive traffic and running 24/7 without a pause. When your business starts growing it is crucial that each of your components is working as expected. In this posts I’ll talk about how you can measure user satisfaction with Apdex metric (Application Performance Index).

In my previous article we’ve reviewed how we can implement a microservice infrastructure in AWS to allow us growing without a limit. What if one of the 20+ microservices you have starts throwing errors because of a database connection error? Or worst, what if one of your microservices increases 5 times its response time? You need a way of tracking and measuring how your users are feeling about your application response and take actions based on this measure!

Introducing the Application Performance Index (Apdex) metric

Ok, so we’re worried about how our users are feeling about our application as an overall. Let’s think for a moment… How we can define a good web application that we all want to use? Let’s see if we could define an abstract definition about this:

  • It is crazy fast. The response time is really low. Users don’t need to be waiting a few seconds for a response.

  • The error rate is despreciable or non existent. After testing and using this application in production we never had an error response (response HTTP status code >= 500). Or, if we had errors, they were resolved really fast and the impact was minimized.

  • It is consistent. We see both response times and errors are consistent with what we’re sending. If the application returns expected data, it is consistent for the business.

So we can see three different aspects of the application’s behavior: response time, error rate and consistency.

The Application Performance Index is an open and industry standard used for measuring performance in software applications. We can use this metric as a simplified Service Level Agreement (SLA) to ensure that what our users are expecting from our application is exactly what we’re providing with it.

Now that we have our metric definition, let’s take a look at how it works.

How the Apdex metric works

The whole idea behind the Apdex metric is to unify different measurements into a single and useful metric we can rely on. So imagine for a moment that, based on the location of our microservice in our entire architecture and its impact on the business, we need this application to respond in 40 milliseconds or less.

OK, so if we go to AWS and take metrics from our Elastic Load Balancer (ELB) we can see this metric right away:

Performance metrics from an Elastic Load Balancer in Amazon Web Services

Now we can see our current response time. So if we use this metric to generate an alert then we’re ready to go, right? Well, not so much… What if we’re responding in less than 40 ms but all of our responses are internal server errors (HTTP 500)?? We’re great in response time but our response is useless to the user because we always return errors… So we need to take another set of metrics from the ELB as well…

Performance metrics from an Elastic Load Balancer in Amazon Web Services

Now that we have both failed and successful responses we can see how well we’re doing as an overall but now let’s take a look at the separated metrics we need to use for calculating how well we’re doing:

  • Target Response Time (Milliseconds).
  • Requests (Count).
  • HTTP 5XXs (Count).
  • HTTP 4XXs (Count).
  • HTTP 2XXs (Count).

We could create a monitor over all of these metrics in a separated way but that will end up being a set of 5 different metrics and monitors for each of our applications. It must be a better way…

Apdex metric calculation

As we have defined, the entire goal of the Apdex metric is to unify different measurements into a single and useful metric. That’s exactly what we’re going to do.

First of all, we need to define a threshold value T (defined in seconds) representing our target SLA for the response time. So, this threshold will behave as follows:

  • If response time <= T the response is satisfied.
  • If the response time is > T and <= 4T the response is tolerated.
  • If the response time is > 4T the response is frustrated.
  • If the response HTTP code >= 4XX then response is frustrated.

Based on this threshold T we can define our Apdex metric as follows:

Apdex metric formula

Let’s see this formula applied to a real example:

Example of a healthy apdex score

This is a really healthy Apdxed score. As you can see, the metric goes in the range [ 0 : 1 ] being 0 users completely frustrated and 1 users completely satisfied.

  • You can see that T=0.4 meaning 400 milliseconds. That’s our SLA for this particular microservice. Every request must be handled in 400ms or less in order to be considered as satisfied.

  • The sample size for that particular minute was 48934 requests.

  • 48835 of those requests were satisfied because:
    • The response time was less than our threshold T.
    • We didn’t have any error when processing that request.

  • 98 of those requests were tolerated because:
    • The response time was bigger than our threshold T but less than 4T.

  • Only 1 of those requests was frustrated because:
    • The response time was bigger than 4T or
    • The status code of the response was >= 4XX.

Now let’s see an example of a not-so-wealthy Apdex score:

Example of an unhealthy apdex score

Now this doesn’t look so good. From the image we can see that we have defined our T=0.5 meaning 500ms or less. As you can see, we have a lot of tolerated requests and the reason is that response time for each of those requests was >T but < 4T. So in this case, in order to comply with the SLA we have defined, we need to reduce the response time of this application. ¿How we can analyze and do this? Well, that’s a matter for a different article.

As you can see, we have reviewed how you can measure user satisfaction with apdex metric. Let me know your thoughts!

I hope you’ve enjoyed the article! If you have any questions or comments, let me know!

Fede

Leave a Comment