client-centric

Client-centric metrics

Here's an example of a practically-meaningless statistic from a user's perspective (this is from a free version of a monitoring service by host-tracker.com checking hobbsontech.com, I believe for a valid HTTP response, every thirty minutes since November 21st):

Straight off, let me say that for troubleshooting this is useful. If someone tells me they couldn't get to this site at noon yesterday, I can quickly check to see if the site was totally down for long enough that the monitoring would catch this. In a more sophisticated environment, one can quickly check to see if a particular server in a cluster, for example, is bad and needs to be dealt with or taken offline. But this kind of metric shouldn't be confused with client-centric metrics, and you should try setting your sights on statistics that come closer to the user experience. Some key things to keep in mind when defining client-centric metrics: 

  • Don't use averages. Users don't think in terms of averages, but in extremes (for instance, "yesterday I saw a page with outdated content"). So, your goals should be in terms like "page loads in less than five seconds within institution's firewall 99% of the time, and within 10 seconds 99.99% of the time." Note that all those 9's in the example above are not about simple uptime, but a percentage of time that a metric is met. Also see Amazon's CTO's description of measuring at higher percentiles (one reason he gives is to ensure high value clients with more complicated, personalized pages, have good response times).
  • Don't use planned maintenance windows as an excuse. Service downtime experienced during maintenance windows should be calculated as downtime. Of course, planned maintenance is still important, since you can warn your users of downtime (and you can hopefully pick times that are lower-impact). But it's still downtime. Of course, if a server is down but the service is still available, then you shouldn't ding yourself in your client-centric metrics (and you should congratulate yourself in doing things in a way that allows downtime of server(s) without downtime of the service).
  • Try to base your metrics on the way a user experiences your system. This will usually involve more sophisticated analysis of responses from the server(s). For example, don't just check for a successful HTTP response from the server (for example, I could create a page that returns a successful HTTP return code but says "This site is down for maintenance"), but check that the page has a valid left navigation, header, and piece of content(s) (perhaps using screen scraping techniques). Also, aside from a troubleshooting technique, don't consider server time for generating pages, but the time a user would actually experience in downloading a page (including pulling in all the components of the page, and, if possible, the time to compute/render a page). If you cache your pages, don't get too hung up on the performance of just cached pages (the end user won't know if it's a cached or dynamic page they are getting, so your metric has to consider the dynamic pages as well). Another example of client-centric metrics for a large system with a suite of sites: length of time between content publishing and appearance in all relevant pages.

But probably most important is to identify client-centric metrics as early as possible, and create a method of tracking these. If possible, you could install a large display outside a manager's office with the metrics in red when you aren't meeting the goals. Here's a table listing some example before/after types of client-centric metrics:

 

Not client-centric More client-centric
Excluding maintenance windows, the server uptime this last week was 99.9% 99.7% of all pages in the last week loaded completely (good header, footer, and content area)
Excluding maintenance windows, 99.9% of all pages were generated within 1 second by the server 99.1% of all pages in the last week loaded within 5 seconds inside our firewall
1,000 content items were published yesterday The 1,000 content items published yesterday appeared on all relevant pages within 30 minutes 99.9% of the time.

 

Responding to urgent user issues

In any system that's actually used (!), you're going to get user reports of problems that need fast response such as access issues (as oppossed to enhancement or bug fix requests). Unfortunately, a common and very difficult type of problem is an intermittent issue or one that you cannot reproduce. That said, even in that case, here are some rules of thumb in responding to a user report:

  • Identify whether or not you saw the problem (and never, ever, just close a ticket just because you don't see the issue -- at a minimum contact the user first *before* declaring something couldn't be reproduced).
  • If you cannot reproduce the problem, then try to walk through the exact steps with the user on your desktop (or by sharing their desktop). Of course, the user may resist this since they're already frustrated if they contacted you. But as we all know a problem may only occur in a very specific situation (that you never do), although it may be the *only* way that particular user does things (so the user feels this always happens). Of course, if there is another way of doing the same thing, then suggest a workaround.
  • Clearly indicate if you *did* something for the problem to go away.
  • Ask for confirmation that the *user* thinks the issue is resolved (this one is important but easy to overlook).
  • Make it clear that the user can get back to you with any follow-up questions.

An example response that isn't very useful to a user: "Try now" and nothing more (the user doesn't know if you did anything, and they might think you don't believe anything was wrong in the first place). 

Giving vs. Taking

A couple rules of thumb about giving vs. taking:

Rule 1. If you give something away for free that your would normally charge for, then make it clear you are waiving the normal rules (and that you may not be doing so in the future).

Rule 2. If you didn't follow Rule 1, then be very wary of changing your policy of giving that something away (even if it is in the manual or contract or whatever). Why? It will feel like you are taking something away (rather than *giving* something if Rule 1 is followed).

This is all *especially* true if it's something that would be easy for the user/client to overlook.

It's very easy to point to the contract, or to the manual, about some policy. But if you have routinely been not following the policy, or not charging for something, then people will continue to expect that they will get it for free. Popping what appears to be a new requirement or cost on them will probably not go over well.

Example One: Let's say a contract stipulates $100/month extra for priority support. If you've silently been giving that priority support for 6 months for no additional cost, and then call up the client to say you'll charge it in the future, they'll be upset. It'll seem like you're suddenly charging them way more money. It's a lot better to just put on the bill in the first place something like "$100/month fee waived for initial six months" or something, so you're *giving* something for six months. Of course, if you mistakenly did not charge that extra amount, then you may want to consider giving a grace period before charging that again (so that the user has time to wrap their head around the idea, and they will also feel that at least they are getting something for free a while longer).

Example Two: Your product's manual says your content should be 600 pixels wide. But your product never enforced that, and, although larger pages didn't look perfect, wider pages didn't look horrible either. If you suddenly change the system (for other good reasons) so that these spurious pages inadvertantly look bad, just pointing to the manual to say they should have kept their content to 600 pixels wide will annoy the users. It would be better to in the first place enforce the rule or at least remind people that you are waiving that restriction but may in the future require it. After the fact of wide pages suddenly looking worse, you can also offer to help your users to review and change the size of their problem pages. Also, if you can somehow change the system fairly easily to be a bit more lax on the requirement, then it would be better to do so.

Obviously, these types of issues may arise because of an oversight on your part (you forgot to charge the additional $100/month in the example above), but in general the main things to try to keep in mind are: a) these types of details *are* important, so try to keep an eye on them, b) try to remind people when you are temporarily waiving a fee or restriction, and c) by all means, don't just flippantly point to the manual, contract, or other document. Of course, there may be times when you do need to fall back to pointing to the document/agreement, but carefully consider the options before doing so.

"Just like current system"

We all encounter users with requirements like this: "This is easy. A college intern set it up in 10 minutes.  We just need you to put that functionality in your fancy-pants system."  (Well, maybe not that exact wording).

I encountered something like this a while ago. The basic requirement was to generate a little web report in a table. We implemented this in our system, and the user immediately called upset that it wasn't working -- it turned out they copied and pasted from the web page to Excel, and this was no longer working as they expected.

Now I know to ask more questions (and give this dangerous flavor of requirement its proper respect). Some questions to pose:

  • do all the users of this system/functionality use the same system/platform (if I watch how you use the system, is that sufficient)?
  • what other systems will use the output of this system?
  • how critical is this system? are you willing to live with some hiccups?
  • are you tied to the exact look and feel of the current system?
  • how do you currently manage your site, and the content and other data on the site?

In a lot of ways, this is a much scarier requirement than a brand new one. Even if there are misunderstandings in the requirements of a new system, at least everyone understands that this can happen. But a quick ctrl-a ctrl-c ctr-v can be all it takes for a user to prove that your implementation doesn't do what the previous one did.

You should try to sit down and walk through the user actually using the functionality. If at all possible, you should also walk through the system yourself, noting any potential complexities (and discussing this potential complexities with your client). Emphasize that there may be details of the current system that they and we weren't aware of. Hopefully you can also concentrate on *improvements* that can be made to the system, so they aren't as dissappointed with small setbacks.

As to phasing, if possible, try to first deliver a pilot or beta so they can play with it. Also, you need to be ready to "fix" things after delivery.