Customer flow is the key to Web Reliability (WR.) A customer who can't complete a search on a website is a customer who is not in the flow. When customers don't flow, neither does money. Our client realized this, and called us in to help because their developer couldn't get site search to load, and couldn’t find a solution.
My client's website, let's call it ghchaise.com, was built on a CMS that we have expertise in. My company even developed a performance evaluation service where we could troubleshoot clogs and bottlenecks in a web property and resolve them. My client reached out to me because their developer had gotten stuck and couldn't get site search to load.
At the time, the Web Reliability Framework did not exist. If it had we would have used it to troubleshoot the problem. The experience I gained from this client is part of why there is such a thing as the Web Reliability Framework. So let's retroactively apply the framework and see if it can get us to a solution.
Our client – let’s call her Claire - was under a huge amount of stress when she called us. Revenue was down. Money wasn’t flowing through the ghchaise.com website. The money wasn’t flowing because the customers weren’t flowing. Everyone who came to the site was getting stuck on search. They couldn't move past it. They were clogged.
The ghchaise.com website had tens of thousands of products to choose from. These were fabrics and textiles of high quality that designers could use to create furniture, drapery, well hangings, etc. The catalogue was so vast that web search was really the only sensible way to get customers to the products they wanted. Search was not about keywords here. It was about filtering and browsing across at least 15 attributes, the permutations of which yielded millions of search combinations.
You can probably already see the problem taking shape. Our client did not have reliable customer flow through her website. So there was no customer flow to turn into reliable revenue.
In WR we start by drawing a tic-tac-toe board. You can use a piece of paper, a napkin, a white board - it doesn't matter. Across the top of the board, above the 3 columns, write Team, Plan, Action. Down the side, next to each of the 3 rows write Motivation, Resistance, Management. Above the board, at the top of the page, write the customer mission statement. This is the who, what and why – the customer's desire in coming to the website. Work your way through the board, ranking the 9 attributes of WR with X's and O's. O's are awesome. X's are terrible.
Claire made it clear to us who her customer was, and why they were coming to the website. So let’s convert her description into an empathetic mission statement written from the customer's point of view.
"I am Janice. I design high-end couches for L.A. and N.Y. clients. Right now I am working with the biggest client I've ever had, a celebrity in L.A. They're cool. They're hip. They're demanding. I need to blow them away with my design for their living room remodel. I've always wanted to use ghchaise.com, but have rarely had a budget to allow it. This is a huge opportunity. I need to get into the site, find what I want fast, and order the samples I need to wow my clients."
That's too long of course. Let's revise down to this, "I'm Janice. I'm an up and coming designer under a lot of pressure. I need to search quickly and easily on GH Chaise so that I can find awesome textiles to wow my clients."
We have our customer mission statement. Now we fill in the board. We score each Web Reliability attribute with an X or an O. O's represent smooth flow. X's represent blockage. We like to dive straight in to the worst and most obvious issues in Web Reliability. Then we work our way around the board from there. The customer mission statement guides us.
Our client Claire came to us and made it clear that the site was crashing during search. Someone like Janice would come in and start filtering and searching on products. The browser or the server would somehow get overloaded and die. Janice would get stuck, then frustrated, then shortly go order samples from some other provider's website. Claire would lose a customer - most likely permanently.
WR encourages us to be honest and fearless, and dive into trouble. If we think we may have a big problem somewhere, we start with that. In our case, it’s site search that's crashing. Any kind of crash on a website is a form of resistance. A website can resist customers by being slow or it can resist them by being totally dead. Since the ghchaise.com website is already live on the web, and not in a planning stage, our focus goes to the Action/Resistance cell of our board (the intersection of the Resistance row with the Action column.) The Action column refers to in-progress, real-time, live functioning systems. We're giving this cell an X. Dead websites are the worst.
WR has the ability of showing us why we are getting an X in one cell based on causes in other cells. Problems with flow are almost always connected to multiple issues across a site. When we get an X in Action/Resistance, we immediately know to be suspicious about the Plan, in this case the architecture of the site. For example, if a site is crashing due to unexpected traffic spikes, the problem isn’t related to too many people coming to the site. The problem is that the site was not prepared for success, a problem at the planning level. The site crashes when people search, so the plan for supporting search was faulty somehow. A crash gets an X so the planning that resulted in a crash also gets an X. That goes in the Plan/Resistance cell. The site architects did not properly plan to handle the resistance that can come from complex search functionality overloading the server or the browser.
Now we need to look harder at the plan. Was this plan validated effectively before being put into action? Based on the consistent crashing of search, it’s clear that whatever type of validation was used, and whatever type of management oversight was involved both failed. The fact that search was going to cause a site crash was completely missed by everyone involved. So we give an X to the Plan/Management cell. It sounds harsh, but the site is crashing. This is a failure. We need to be brutally honest in order to fix it.
When we look deeper into the issue, we trace it back to the people who came up with this plan that has failed so badly. We learn that the plan did not go through a validation or testing process, and nobody who was in charge of oversight of the team or the project appeared to have ever asked for this. This type of failure – really an abdication of responsibility - deserves an X.
Our board already has 3 X's. When you have an X anywhere on the board, you feel like you should get to work right away and not worry about the other cells. However, WR sees things differently. Websites are highly complex and intricate. Knotty web problems usually involve interconnecting layers of issues. WR tries to untie these knots and show the interconnections. So we stay focussed and keep working our way through the board.
We can give an O to Team/Motivation. The team who built the site and the client who hired them were all motivated to do excellent work. They just had a bad plan. The plan was good at the Plan/Motivation level. It factored in a number of methods to keep the customer, Janice, engaged and motivated. We can give this an O. Janice's ongoing action in real time was motivated. When the site did not crash, she was able to smoothly act on her motivation. This cell also gets an O. All that's left is Team/Resistance, Action/Management and Team/Management. Not only was the team motivated, but they worked well together. Communication was solid, methods were good. There was no team resistance. There was just a failure of planning and a failure by management to detect this failure through plan validation. Action management was also fine. This is the monitoring level where we have systems that watch server performance and signal problems. Our search problem was a fast crash, not a slow burn. Real time monitoring would not have signaled a problem that could have been remedied by allocating more resources or anything. Remember that we have an X in Plan/Management. The plan was not validated. Plans are created by teams. Teams are held together by managers. Team/Management failed here somehow. This gets an X.
We focus on X's in WR. X's in planning get priority. A bad plan cannot be remedied by good action. So there is something wrong with our plan for supporting search. We know we need to dive in here and question the architecture.
Eventually even this CDN caching plan began to fail because the traffic on the site combined with the number of search permutations meant that we would never have a 'warm cache'. A 'warm cache' refers to having a suitable number of pages ready and available for use in the caching tool, in this case CloudFlare. CloudFlare naturally purges stale and unused urls from its cache. If the site had a lot more traffic, CloudFlare would have kept more pages warm in the cache. There was a mismatch here. The plan was still bad.
At this point we still had Xs on the board, but fewer of them thanks to our rearchitecting and caching work. Though we had changed several X's to O's, the goal was to get rid of all X's. We still had the original X, slow search.
After more trial and error, more Plan/Management activity, we eventually found a new hosted search service called Algolia. Algolia allows you to populate its search indexes with your data. You can then query the indexes over their API and get results back incredibly quickly. The user gets an experience that feels almost instantaneous. The speed of the system, since it was designed explicitly for fast searches, blew our minds. With Algolia no caching was needed. We just kept the Algolia index fresh by pushing the current product content into it. Having all of the product information kept ‘warm’ in the Algolia index meant we received lightning fast multifaceted search results. This simple, elegant architecture finally transformed the rest of the X's on the board into O's.
Search function had not just been fixed but upgraded. Customers coming to the site were able to find what they were looking for and achieve their goals. Customer flow was restored. And with reliable customer flow came reliable revenue.