As engineers we expect our systems and applications to be reliable. And we often test to ensure that at a small scale or in development. But when you scale up and your infrastructure footprint increases, the assumption that conditions will remain stable is wrong. Reliability at scale does not mean eliminating failure; failure is inevitable. How can we get ahead of these failures and ensure we do it in a continuous way?
Ana Margarita Medina, a Staff Developer Advocate at Lightstep and Darko talk all things about SRE (Site Reliability Engineering) and DevOps. They discuss the finer topics of both and the differences between them.
How does your team prepare for failure and learn from incidents? GameDays are a time to come together as a team and organization to explore failure and learn. This practice has been done across most industries, from fire departments to technology companies.
Keptn integrates with most of the great projects in the CNCF landscape (and beyond) that help teams to deliver and operate their cloud native workloads. The integration happens through open event standards (CloudEvents, CDEvents) which is why Keptn makes it easy to connect tools and orchestrate the application lifecycle regardless of your toolchains. What could make Keptn better? Upstreaming and generalizing the best of it!
Chaos Engineering lets you compare what you think will happen to what actually happens in your systems. You literally break things on purpose to learn how to build more reliable systems. Lenny Sharpe walks you through Chaos Engineering at Target, covering the tools and practices you need to implement Chaos Engineering with Kubernetes in your organization. Even if you’re already using Chaos Engineering, you’ll learn to identify new ways to use the practice to improve the reliability of your network and services. Ana Medina will share a demonstration of how you can practice Chaos Engineering on Kubernetes and use it to improve the reliability of your systems.
Our panel of SRE experts, experienced in SRE roles, reflect on how SRE has changed, how to adapt SRE to the unique aspects of each organization and how software and tools continue to evolve to meet the changing needs of SREs. Mitch Ashley, Techstrong Group, and Austin Parker, Lightstep, host an engaging conversation with our panel of recognized experts while engaging with our live online audience