Learning from Mars: Fail Fast Isn’t an Option

This week we collectively experienced the culmination of years of design, planning, engineering and re-engineering through the arrival of multiple spacecraft around Mars. For me, it was a very personal moment as back in 2017, I and my colleague, Karen, started down a six-month relationship with the NASA/JPL to rethink and reimagine training design and delivery for the Perseverance mission. Even though it was yet another competency-based education and digital learning ecosystem project to both of us, I left that project a changed person — it was the first time in my life I experienced thinking where failure truly isn’t an option.

Yes, it was an incredible experience to work with so many people at JPL and to meet and learn from Martian exploration leaders like Nagin Cox. Engaging with her and the engineers, scientists, technologists and designers who support the missions, problem-solving and the attainment of perfection at the lab is very different than all but a few places on the planet. At a project workshop, I kicked around design-thinking ideas with one of the lead scientists on the Europa Clipper mission. Responsibility, reality and risk underpinned our conversation on thinking differently about achieving mission success.

The NASA/JPL team is relentlessly driven by success that is pressed against immovable deadlines and unfathomable challenges. With billions of dollars on the line, problems that have no known solutions and narrow windows of opportunity that may not come again for years or decades, failure is something that they take very seriously. Space is hard and deep space is exponentially harder — you can’t crash the landing and try again next month.

I’ve grown to understand that failure is easy and success is hard. This belief is why I’ve had a fraught philosophical relationship with the concept of fail fast and fail often, which has become a mantra in consumer cloud services, aspects of software development and in several other innovation-centered circles. Granted, failure at JPL is absolutely part of the equation on Earth, but failure on Mars spells unrecoverable doom. There is definitely a switch from research and development to “production” when it comes to deep space, so why haven’t we learned that here on Earth?

Fail fast, fail often is a very dangerous concept if it is left to drive technology that has moved into production. If I started my daily driver automobile and the engine immediately exploded, failing fast would put my life into jeopardy. Likewise, if my company paid me incorrectly every other week, failing often would likely drive me to another job. At NASA/JPL, the teams understand the difference between research and development and production. Production has little room, if any to fail. Yet companies like Facebook and many other consumer-centric service providers continue to apply fail fast, fail often principles to in-production product development without consumers’ knowledge. The result has been our unwitting participation in an utter distortion of our global social fabric and the loss of information because the consumer is often seen as a willing subject in large-scale experimentation (which to the vast majority of consumers, no amount of terms and conditions fine print can justify this behavior — we don’t expect us and our data be treated in this way). For what it’s worth, this concept isn’t new — back in the early 1990s, Silicon Graphics released production workstations with beta versions of its IRIX operating system, which were loosely documented and in my case, had very real bugs that nearly triggered damaging effects to systems in production.

Conversely, striving for ultimate perfection is often the nail in the coffin of innovation. By its very definition, perfection is a form of unobtainium; there will always be a better way of doing something if you wait until tomorrow. Death by committee is the ultimate bureaucratic expression of this — perfection can’t be attained until everyone thinks it’s perfect. Perfection is often subjective and therefore a moving definition that cannot be attained. Startups, for example, couldn’t get off-the-ground if it is striving for ultimate perfection; the product would never be good enough for the market.

What JPL has learned and we visibly see with SpaceX during its testing efforts is that fail fast, fail often is applied where it makes most sense — in research and development where high degrees of risk are anticipated, accepted and evaluated. Where both organizations don’t play fast and loose is in production. Launching a rocket and releasing a payload needs to be flawless with every possible risk accounted for. The mission — the production activity — is what matters.

I think it is time to reassess the validity and dogmatic adherence to the fail fast, fail often mantra. When applied outside of a controlled research and development context, the unexpected consequences of failure can be far reaching and devastating to individuals, customers and society. We all need to better understand that fail fast isn’t always an option.