Designing a Shift-Left Incident Response Plan
by Dan Holloran, on Nov 7, 2019 3:00:21 PM
Complex microservices and containers, cloud-based architecture and CI/CD are helping developers deliver software faster and faster. But, at what cost? With how quickly new development technologies are popping up and constant adjustments to organizational structure, reliability can take a backseat to speed. Operations teams are being asked to continuously deploy services faster than ever without being given ample context or support. So, how do developers and IT professionals work more closely to continuously test and deliver highly-performant applications and services in production?
That’s where the shift-left mindset comes into play. Developers are allowing IT operations teams to get involved earlier in the development lifecycle to conduct testing during the process, not afterward. This allows QA to happen at the same time that development takes place – leading to shorter waiting periods between releases. To the end-users taking advantage of your application or service, an Agile development process is only as effective as the IT operations supporting it.
What is shift-left?
Even if your development and product teams plan, code and build services faster than ever before, they still need a process for deploying these services at the same pace. If anything, the release management process will actually slow down because you’ve created a massive backlog of work. Then, without appropriate context, IT teams are forced to figure out how to prioritize their releases.
So, engineering and IT teams are adopting DevOps and a shift-left mindset to combat this problem. IT operations are continuously running tests through staging (and production) environments, conducting automated and manual QA and involving themselves in the planning phase of software development. This leads to DevOps-minded teams who are prepared to ship code faster AND more reliably.
Then, with DevOps alongside a shift-left mindset, develops can’t simply wash their hands of the feature once applications and services are deployed to production. In a shift-left world, developers are put on-call to help IT teams support the applications and services they build. On-call responsibilities, incident management and response shouldn’t fall solely on the shoulders of IT professionals who likely had nothing to do with the incident’s occurrence.
Traditional incident response in DevOps and IT
Traditional software development and IT operations naturally broke the teams into silos. Development wrote the code and built the service while operations teams were responsible for maintaining IT infrastructure and deploying the services to production. However, as teams scale and services grow, this simply isn’t a sustainable practice. No matter how good your developers are, it pressures IT operations into taking full accountability for uptime and performance – including fixing problems quickly when they pop up.
This leads to an incident response process where sysadmins, database admins and security analysts all have no idea why problems pop up in production. So now, instead of immediately working to fix the incident, IT professionals are being forced to figure out what’s happening first. It also means developers don’t need to be on-call for issues and aren’t easily available for escalation – even if it’s something they’re familiar with and could resolve in minutes.
So, IT is shifting further left in the development process and developers are pushing a little further right, creating a more collaborative software delivery system focused on transparency and accountability.
Continuous testing, monitoring and alerting throughout the software development lifecycle
IT professionals are getting input earlier in the development process and exposing vulnerabilities and weaknesses before they reach production. Through constant monitoring and alerting, even in staging environments, and continuous testing across all applications and infrastructure, IT operations is no longer being ignored in the development phase. In planning meetings, operations teams can voice their concerns, create testing frameworks and work with developers to ensure successful releases continue to go out.
With earlier collaboration, developers and operations teams can prevent blockers and maintain a constant flow of work through the development queue. With better monitoring and alerting, all of engineering and IT can share visibility into processes, metrics, logs and dashboards that inform them of what’s happening in development and with production environments. This is the beginning of a truly DevOps-forward organization. Then, when something breaks in production, developers and IT professionals have a seamless way to communicate and route alerts between people and teams.
Whether incidents occur in production or pre-release, you need to know when something breaks. And, it’s better to know about a problem before it actually affects end-users. So, continuous testing, monitoring and alerting are must-haves for any highly efficient DevOps-oriented, shift-left incident response plan.
Dedication to proactive service resilience
A DevOps mindset and the adoption of a shift-left model allows teams to proactively identify incidents and create resilient systems. Unit tests, continuous automated tests, QA, and constant alignment between configurations will assure developers that their code will work in production. IT professionals can feel better about encouraging developers to ship code quickly without having to wait long periods of time to deploy it. The shift-left philosophy shares accountability for service resilience across all teams – lightening the load on IT operations and helping teams build more robust systems.
Adopting DevOps principles for efficient workflows
DevOps principles tighten feedback loops and improve collaboration between IT and engineering across all aspects of the software development and incident management lifecycles. With faster detection and routing of incidents, often before they reach production, DevOps teams can ensure resilient services and uptime for customers. Then, when something does go wrong, developers are available to help on-call operations teams quickly update code and push changes to production environments.
When there’s workflow transparency across all developer processes and IT operations, everyone has deeper exposure to staging AND production environments, leading to more collaborative engineering.
From Git commits to uptime monitoring metrics, track everything in a centralized timeline and create intelligent alert rules to identify issues faster and make on-call suck less. Learn how VictorOps, a centralized incident response tool, can help you manage on-call schedules, digest contextual monitoring data and route alerts to the right people at the right times. Sign up for a 14-day free trial or request a free personalized demo today.