This post is part of the Everything is Code article and it aims to give some hints on how to approach DevOps in the context of my little formula:
DevOps is better development practices
In the previous chapters, I have stressed how DevOps is key to high deployment frequency, which is key to (IT) agility. This concept is well accepted within the communities to which I talk, however, it happens sometimes that organizations tell me: “Our products do not need to evolve so fast and we don’t need/want this type of deployment frequency”.
Please let me be opinionated on this matter: DevOps is better development practices that not only allows you to deploy more frequently but also enables you to deploy better, more robust software in general.
While I warmly invite you to read the “2018 State of DevOps Report” published by DORA to get insightful data collected from over 30,000 surveys, I would like to briefly mention that the analysis shows how top DevOps performers not only deliver faster, but also more, and with less failures/bugs – And they are also able to recover faster.
So, what’s DevOps?
The definition that I like the most is
The union of people, process , and tools to deliver continuous value to customers
This definition well suits with the “Culture” multiplier in my formula and puts continuous value at the center. This requires to bring application engineering (development), operation engineering, quality engineering, and security engineering together, breaking down practices that were once siloed.
Improved coordination and collaboration across these disciplines reduces the time between when a change is committed to a system and when the change is being placed into production. And, it ensures that standards for security and reliability are met as part of the process.
“Pills of DevOps”
Many of the things that you will find below in a compact form are available with details at DevOps at Microsoft.
Because DevOps relies on an agile mindset, if you are not, you may want to familiarize yourself with What is agile?.
As a reminder, there is no dogmatic truth in what is listed here but just sharing of some practices that have delivered good results within the context of the teams that have been using them – At Microsoft and among organizations with which I have been working .
I have grouped my thoughts around 1) DevOps Teams, 2) Planning and Release Management, 3) Quality Assurance, and 4) Live Site Management.
1) DevOps Teams
“Food for thoughts”:
- teams fully own one or more microservices and consist of 10-12 engineers and 1 program manager (or service manager or product owner)
- teams are vertical, covering front-, middle-, and data-/backend- layer
- teams own development, testing, and deployment of features, and are also responsible for the service to run smoothly in production
- Comment: It may sound a bit harsh, but if what you put into production can keep you in the office at night, it is in your own interest to deploy high quality code…
- teams are (ideally) physically in the same room so that communication runs continuously without the needs of meetings (except the daily scrums)
- teams are self-managing and intact for 12-18 months
- teams are self-forming. After these 12-18 months, Program Managers expose the high-level roadmap of their services and Engineers have the possibility to express with priorities for which service they would like to work for the next 12-18 months
- Comment: This increases job happiness, reinforces the DevOps culture, and enables cross-pollination.
2) Planning and Release Management
One of the questions that I am asked the most is how we do release management at Microsoft. I normally like to use as example the Azure DevOps team because they apply the “highest form of dogfooding” by using Azure DevOps to build Azure DevOps. This also allows me to refer to the following articles that well describe the approach they follow:
* today Azure DevOps.
Their approach to planning covers alignment and autonomy:
- Alignment represents the big picture in light of the business goals. It includes the product strategy over a period of 12-18 months (product roadmap) and a (high level) feature planning, over a period of 6 months.
- Autonomy covers the details about what will be delivered to achieve the business goals. It includes Stories and Tasks.
“Food for thoughts”:
- Release sprints are 3 weeks longs
- Comment: This length comes from empirical trying. 2 weeks has proven to be too intensive, 4 weeks too long.
- Forward planning covers the next 3 sprints
- Within the sprints, some teams plan only with tasks that are no longer than 4hr.
- Comment: This is very powerful because if a task is planned for today and it is not completed, it will be discussed in the daily scrum of tomorrow. This enables extremely fast handling of issues.
- Trunk-based development is great and it virtually eliminate the “merge debt”
- Feature Flags are used to deploy fast and to make features available to selected (test) groups. They also protect from rollbacks, which are complicated when you are deploying in several regions or rings.
- Canary Releases are used to validate changes incrementally
- Aim to commit a lot and never roll back!
Following this approach, feedback flows in fast from pull-requests, daily scrums, testing, and users, considering that code goes into production every 3 weeks. This enables continuous adaption and adjustment of deliverable to maximize customer/user experience.
3) Quality Assurance
By increasing your deployment frequency
- you will be dealing with smaller amount of code to test
- you will have a small number of bugs to deal with
- you will exercise more your testing skills and become more proficient with it.
- Test teams no longer exist in this setup
- DevOps teams are responsible for the quality of their code
- Pull Requests (PR) ensure that only “good code” merge to master. To complete a PR, DevOps teams at Microsoft need:
- Bug-cap is a great way to foster quality
- If there are more than 5 bug per engineer in one team, work on new features is stopped to focus entirely on bug fixing.
- Shift Left: Pull request flow gives a common point to force testing, code review, and error detection early in the pipeline. This helps shorten the feedback cycle to developer. Errors are usually detected very fast. This also gives confidence when refactoring, since all changes are tested all the time.
- Shift Right: There is no place like production. This is the shift-right part of the strategy, it’s a set of practices about both safeguarding the production as well as ensuring quality in production.
4) Live Site Management
As mentioned above, DevOps teams are also responsible for live-site issues and interruption – For keeping their code running smoothly in production.
- To provide focus and assist with an interrupt culture, each DevOps team self-organizes into 2 distinct sub-teams:
- F-Team (Feature) works on committed features (new work)
- L-Team (Live-Site) deals with all live-site issues and interruptions.
- Typically, 2 engineers in turns become the L-Team that is 100% focused on keeping the services running smoothly.
- When no issue is on the radar, they work on dashboards, monitoring improvements, and similar – But no development of new features! This is the job of the F-Team only.
One important advice…
You don’t have to be perfect from the beginning. A good approach to maturing your DevOps practices is to start from where it hurts most. When this pain is cured, move to the next one that hurts the most. I have seen great progresses with this simple approach, while I have seen great struggles with team that were aiming for perfection from the beginning.