One afternoon I received an email from my contact in Sunnyvale California. Let's call him Abe. He seemed stressed out. He told me a story of how he was responsible for getting a major website launched. The website was going to serve a developer network that would compete with the Apple App Store. The effort was part of a multi-part push to compete with the iPhone and to try to save the company as a whole. This website was important. A lot was riding on it. Trouble was, they couldn't finish. They couldn't launch. They were having problems with the development team and the technology stack that had been adopted.
Reminds me of a good quote.
“For the want of a nail the shoe was lost,
For the want of a shoe the horse was lost,
For the want of a horse the rider was lost,
For the want of a rider the battle was lost,
For the want of a battle the kingdom was lost,
And all for the want of a horseshoe-nail.”
Abe reached out to me because he knew my company had deep expertise in ExpressionEngine, a Content Management System (CMS) that his company had trusted and used successfully on smaller initiatives. We’d worked with this client before on some small customizations and non-critical web tasks. We too were trusted. Abe was in a bind. He needed someone to help him get unstuck. Several stacks of VP's and other executives were leaning on Abe to get the project done. By the time he reached out to me he had only 90 days left to finish and launch his website. The clock was running down, and Apple was about to declare victory.
Abe should never have had to call me in on this job. He should not have let things get to the point where the shot clock was down to the last few seconds and he would have to hurl the ball across the court right at the sound of the buzzer. And score. He had good instincts and a lot of experience, so he had a well-developed feel for the big picture of a website and its problems, even without a tool like the Web Reliability Framework. Nevertheless, it would have really helped Abe if he had a tool, a roadmap, a language for how to see a problem forming and make an argument to resolve it before it got too big. A situation like this one is one of several that Web Reliability tries to serve.
So imagine that the Web Reliability Framework existed during Abe's time at the big mobile multinational, and he learned about it and decided to give it a try. Let's run his problem through the framework and see if it could have helped Abe understand his problems more clearly, begin to see how to address the issues, reduce his stress and deliver the website in a more sustainable and reliable manner.
Abe can use Web Reliability (WR) as a troubleshooting tool. He can also use it as a progress monitoring tool. In our case, Abe would have loved to have its help as a planning tool. So let's imagine that Abe is about a month into his project. He's at lunch with his department colleagues in Sunnyvale. They're having Phở. Noodles and broth are splattering on the table, on people's clothes. The team is distracted. Their guard is down. They're being blunt and honest.
Abe grabs a napkin and a pen. He draws a big tic-tac-toe board on the napkin. Across the top, above the 3 columns he writes Team, Plan, Action. Down the side, labeling the 3 rows, he writes Motivation, Resistance, Management. Across the top of the napkin he writes a 3 sentences. Who, what, and why. He makes sure to state the customer’s primary desire in coming to the website. Abe makes sure to capture his empathy for the customer's pain and their desire to resolve it.
Across the top of the napkin, in his customer's words, Abe writes, "I am Tran, a mobile app developer. I have a great idea for a killer app for X device. I could build it for iOS, but I'm hoping to make more money on X platform and have an easier time building my app because of super helpful documentation that is easy to find, quick to load and quick to digest."
Abe and his colleagues at lunch are satisfied with the empathetic customer statement. Now they need to fill in the framework and give a score to the current state of their effort. Hopefully the score stays as close to zero as possible. If it doesn't, it signals a serious problem with the website project.
Abe knows that the whole point of the Web Reliability Framework is ensuring website reliability. Websites are deemed reliable when they reliably generate revenue for their owners. Abe will know the developer.bigcompany.com website will reliably generate revenue only when customers of the site are able to flow into, through and out of the site successfully and with as little friction as possible. Abe's problem is that nothing is flowing because there is no website yet. Can WR still help? Yes.
Abe begins the process of scoring his developer.bigcompany.com website, using the 9 cells of the framework. He thinks about how frustrated he and his team have been with their inability to get the Amsterdam-based development team to show up for conference calls needed to make steady progress on the build. Because he’s read the Web Reliability Framework description, he knows this is a Team/Resistance problem. He finds the cell where the Team column intersects with the Resistance row. This is the third week that Abe's team has been blocked due to timezone problems with the team in the Netherlands. Abe marks the Team/Resistance cell with an X.
Apart from customer desire, team is the most important part of Web Reliability. With a strong engaged team every other WR problem can be overcome. In Abe’s situation, not much at all is getting done due to friction at the team level. One of Abe’s colleagues, April, asks what would be involved in just firing the Dutch team and finding a different one. Abe points out that after consulting with the VP of Technology, his boss, the VP of Marketing, committed to using a specific internal Java platform as the CMS for the website. This choice means the Dutch team is the only team that can do the work. April looks at the tic tac toe board on the napkin. She points out that technology choices fall under the Plan column in WR.
April points to the Plan/Resistance cell on the napkin, and suggests that Abe mark it with an X. Abe resists, and says maybe things aren't that bad. April points out that 3 weeks have gone by with very little progress, time is quickly running out and there's no resolution in sight. She spells out the way that the rigid technology plan has created so much resistance that the only solution to their current problem - firing an underperforming team - is entirely blocked. The dev team is stalled, and they will remain stalled because the technology choice has locked them into position. April wins. Abe writes an X for Plan/Resistance. With April's help, using the WR Framework, Abe can now see clearly that his website development project is entirely stuck in gridlock. He doesn’t like it, but it’s true.
April has a lot of opinions about the technology choice made by the VPs, and argues that it’s obvious the internal Java platform needs to be abandoned in favor of something more reliable, that can be built quickly. Abe looks back at the napkin, and sees there is a Team/Motivation cell to be scored. He wonders out loud if the Dutch team could be motivated to get re-engaged if his team presented them with a timezone compromise. Abe is trying hard to be optimistic. He suggests that maybe the director of the Dutch team can escalate the urgency and get the devs to just power through and complete the build, despite the technology problems. April reminds Abe that they tried this 2 months ago on another project, and nothing changed. April argues that they have to be realistic, that there is no time for anything else. Team/Motivation has to get an X too, she says. Abe sighs and agrees. (No offense to my Dutch friends generally. You guys are swell. There were just a few bad apples on this job.)
Abe, April and the gang already see 3 X's on their board. They know they are in trouble. But the noodles aren't finished, so they keep going. Abe says, "What's our plan management score? How did we check our work and validate our plan before choosing the internal Java platform?" April says, "We didn't. We weren't given a choice. We weren't allowed to test our tech plan. We weren't allowed a proof of concept. We weren't allowed to see proof that the Java platform could be stood up and launched quickly. There was no validation. The management layer failed." The plan management cell gets an X. Things are looking grim for this board. We already have 4 X's and we've only filled in half of the 9 cells of the WR board. Even if every other cell gets O's, we're still screwed.
Abe's team can now see the problem clearly, thanks to WR. They are locked into a technology choice that they didn’t participate in. It was never properly validated. As a result, they are shackled to a dev team that is not motivated and generating massive resistance. The Team/Management cell gets an X now too, because the director of that team is not a good enough manager to help the team find its motivation. Abe’s team is sober. They all agree that at this point Abe can skip scoring the other cells, because the problem is so clear and intractable. April is right. He needs to dump the technology so he can dump the Dutch team. He needs to validate a new technology and hire a new team to execute the build. There’s still time to succeed with this build if he moves quickly.
We can see that WR worked really well in this situation to break down an overwhelming problem into clear component parts, and Abe was able to identify the problem clearly and quickly, and work with his team to come up with a viable solution. Only one problem: The WR Framework didn’t exist 5 years ago, back when Abe needed it. In reality, it took him and his team much longer to identify the root problem and understand what needed to be done. In reality, Abe ended up having only 90 days to make a massive change to the plan and the dev team, and get the site built.
In case you're wondering, Abe called us. Our company built a proof of concept for Abe and his team. We validated the technology plan. We proved our motivation by hitting all of the milestones we agreed to, and we launched the site on time and on budget despite the odds. But that's another case study for another Web Reliability day.