How to do DevOps
Rebrand your ops/dev/any team as the DevOps
Manage change with plan and architects
No plan survives contact with the enemy.
Helmuth von Moltke the Elder
SRE
Site Reliability Engineering
“specific implementation of DevOps with some idiosyncratic extensions.”
SRE is “what happens when a software engineer is tasked with what used to be called operations.”
50% ops (issues, on-call, and manual intervention)
50% development tasks (new features, scaling or automation)
Reduce organizational silos
SRE shares ownership with developers to create shared responsibility
SREs use the same tools that developers use, and vice versa
Implement gradual changes
SRE encourages developers and product owners to move quickly by reducing the cost of failure
SREs have a charter to automate manual tasks (called “toil”) away
Measure everything
SRE defines prescriptive ways to measure values
SRE fundamentally believes that systems operation is a software problem
What is a Risk
occurence
severity
non-detectability
Risks
Software fault tolerance
unusable product ⚖️ not helpful stable
Testing
outages, leaks… ⚖️ lose your market
Push
Every push is risky
Canary duration and size
Costs
redundant resources
📉 opportunities
SLA / SLI / SLO
Service Level Agreement
commitment between a service provider and a client
Service Level Objective
SLI achievement values
Service Level Indicator
Measure of the service level provided by a service provider to a customer
Time-based availability
uptime.is
Aggregate availability
Error Budget
Determines how unreliable the service is allowed
Finding the right balance between innovation and reliability
Can be + or - by top management
Chaos Team
Test your errors
Blameless Postmortem
Downtime or degradation
Data loss
On-call engineer intervention
A resolution time above some threshold
A monitoring failure
Planning
It always takes longer than you expect, even when you take into account
Hofstadter’s Law
Finishing projects
Adding manpower to a late software project makes it later.
Brook’s Law
Empowerment
Individuals are less likely to offer help to a victim when other people are present; the greater the number of bystanders, the less likely it is that one of them will help
Bystander effect
Conway’s law
Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure
— Melvin E. Conway
Component Team
optimized for delivering the maximum number of lines of code
focus on increased individual productivity by implementing ‘easy’ lower-value features
leads to ‘invented’ work and a forever-growing organization
dependencies between teams leads to additional planning
waterfall
exploits existing expertise; lower level of learning new skills
https://less.works/less/structure/feature-teams.html
Feature Team
“Spotify Model”
The “Spotify Model” is not an Agile Method
Don’t scale agile… descale your organization
Valve: Cabal
Self-organized multidisciplinary project team
Form organically
People decide to join the group based on their own belief
Structure change according to new requirements
p15 Valve_NewEmployeeHandbook.pdf
OKR
Objective
a clearly defined goal
Key Results
specific measures used to track the achievement
The goal of OKR is to define how to achieve objectives through concrete, specific and measurable actions
1 quarter
Public
Can be shared across the organization
Voyager
Join another team for 1 sprint/quarter to achieve an Objective needed
Peer Review Evaluation
No one than your peer(s) can evaluate what you did to achieve Objectives
Build you own model
Depending of:
Change when needed
Today’s work is the legacy of tomorrow.
Everything-as-a-Service (EaaS / XaaS)
Pair programming
Review
Changes have to be read and merged by other of your team.
Everyone will be responsible of it after.
Declaration over Convention
Your convention may be not those of other teams
Eat Your Own Dog Food
You have to use your own product to know how your user are feeling
Test eveything
Never assume it works
Brown Bag Lunch (BBL)
OpenSpaces