On February 3, 2019, the Sumo Logic platform experienced its biggest ever spike in incoming data and analytics usage in the company’s history. On this day, close to everybody in the U.S., and many more people across the world, experienced a massive sports event: Super Bowl LIII. The spike was caused by viewers across the world tuning into the football game using online streaming video. CBS Sports Digital was pushing out the live streams via multiple content delivery networks (CDNs). All the CDN and mediastore access logs were pushed into Sumo Logic for real-time monitoring of the viewer experience. I am incredibly proud to report that on this day, the only thing more boring than the actual game was being in our war rooms. We watched the game stream. That was all there was to it.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.
Some statistics: during the game, our production deployment elastically grew to about 2x its usual size to accommodate the load coming from CBS. They were sending at a rate of more than 3x of what any given customer had ever sent us (interestingly enough, these records have since been broken already, more on that another time). But this was not alone an exercise in scaling up data ingestion. I am sure you are not surprised to learn that we are tracking analytics performance as obsessively as ingestion latency. I am happy to report that our median search performance was below two seconds and was overall well within our internal SLOs, providing the responsiveness you have all come to appreciate even under these conditions.
I believe that there are three fundamental reasons why we managed to pull this off without a hitch. The first goes back to two fundamental philosophical convictions that drove us to start Sumo Logic in the first place. From our previous experience, it was clear to us that we should stop burdening customers with running the system and instead focus them on using the system! From this, we derived the imperative to deliver log management, machine data analytics, and in today’s terminology Observability as a Service.
The second conviction: don’t struggle to build for the largest customer (as we had done in our careers previously). Instead, build for ALL customers. This is why Sumo is not just any old SaaS or “cloud” thing. This is why Sumo is a fully multi-tenant service. This is why a temporary spike coming from one customer does not become a doomsday scenario. Sumo is big, make no mistake. Big, but nimble: built to scale elastically at a moment’s notice. It is obvious today that this is how it should work. It wasn’t ten years ago. People didn’t believe that such a SaaS could scale. Ermahgerd!
Lastly, as it goes with all success stories, there was, of course, a lot of preparation going on behind the scenes. The folks from CBS didn’t just turn up the morning of the game. Instead, we started talking with them about their plans in late 2018. I can safely say that never before have so many Sumo engineers watched the NFL playoffs with this much interest! We used the playoff games, which of course were also streamed, as increasingly more intense dress rehearsals. And in the background, our Quality Engineering team tirelessly added even more tooling to our stress testing artillery. Behind the scenes, we ruthlessly beat the crap out of our internal deployments while moving to the production systems. We are now in possession of a truly fear-inducing armada of logs and metrics panserbjørne that we continue to unleash internally.
Throughout the playoffs and then the final game we had internal war rooms across all three engineering locations (Redwood City, USA, Warsaw, Poland, and New Delhi, India) to make sure we had all hands on deck. Our Site Reliability Engineering team (supported by a large number of devs and QE, because that’s how we roll) was continuously paying attention in real time. Thanks to all the preparation, those hands ended up idle, but for us and for our customer, that was the best possible outcome. We also hung out in shared Slack channels during the games and beyond, coordinating with the customer engineering team. And we had folks embedded with CBS Sports Digital in their San Francisco war room! We take pride in our core value of ‘Being In It With Our Customers’ and once again, this obsession became the driving force of a successful event.
For more technical and operational details from the CBS team, see this great post on CBS Sports Digital’s Strategy For Streaming The Super Bowl.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.