SlideShare a Scribd company logo
1 of 34
Scaling A Start-up DevOps Team To 10x
While Scaling The System 50x
Christian Beedgen – Co-Founder & CTO
Stefan Zier – Lead Architect
DevOpsDays Austin 2014
Christian Beedgen
– Co-Founder, CTO
– ArcSight, Amazon, …
– No prior experience running production systems
Stefan Zier
– Lead Architect, first engineer
– ArcSight, Amazon,…
– No prior experience running production systems
Intro
2
3
Scaling
Spreading constructive beliefs and behavior
from the few to the many.
Robert I. Sutton
Scaling up Excellence: Getting to More Without Settling for Less
4
Petabyte scale log management platform
Big Data™, High Velocity, Human Real Time
Distributed
100% in AWS
Service Oriented Architecture
99% in Scala
Run by engineers
The Sumo Logic Service
5
Data Ingest
6
Code Commits, Services
7
Engineering Head Count
Sumo Logic Confidential8
0
10
20
30
40
50
60
The Challenge
9
Scaling Sumo Logic
– More confidence and uptime
– More operators
– More change
– More services
10
DevOps Culture
Spreading Knowledge
Control surfaces
How We Scaled
11
12
Culture
a shared, learned, system of values,
beliefs and attitudes that shapes and
influences perception and behavior — an
abstract “mental blueprint” or “mental
code.”
One week, 24/7 responsibility for
– Operational decision making
– Alert response
– Deploying the bits
– Configuration changes
Pair of people (primary, secondary)
– Social schedules & travel
– Training
– Relief after a noisy night
Being On Call
13
Sumo on Sumo
– Perfect dog fooding use case
Post mortems
– Drive improvements from incidents
Alerting
– Code I wrote yesterday just woke me up at 4am
Feedback Loops
14
Mandated for PCI compliance
– Change Management Board = Channel on Slack
– Change Request = JIRA ticket
– Audit trail = Paste slack conversation into JIRA
Actually helpful
– Good documentation
– Starts good discussions
– Makes change mindful
Change Management
15
16
Spreading Knowledge
Tactical
– Daily Standups
– Chat
– Playbooks
Strategic
– Mentoring
– “How the sausage is made” sessions
– Checklists
Spreading Knowledge
17
18
Playbooks
19
Linked to alert
– GitHub wikis
– URL in alert
Focused on MTTR
– Steps to restore service
– List of Subject Matter Experts to call
Continuously improved
– Boy Scout rule
Culture
Knowledge
Control surfaces
Three Pillars
Sumo Logic Confidential20
Checklists
21
Improve outcomes
– Ensure experts don’t miss any critical steps
– Prevent repeating mistakes
Well designed
– Coherent
– Living documents
– Concise, clear and require specific actions
– Need to be short and well-organized
– Are NOT step-by-step instructions
22
23
DevOps Friendly
24
Control Surfaces matter for scale
– Simplify complex operations
– Consistent view
– Built-in safety
Natural to use
– Easy to learn, discover
Natural to extend
– Every developer
25
dsh
26
dsh
– CLI
– Full stack
– Fast
– Safe
– Secure
– Proactive
– Discoverable
Model Driven
27
Creates consistency
Provides guard rails
Deployment
– Cluster
• Instance
– Assembly
Configured at all levels
28
daemon restart api:p:25,receiver:p:10
29
dsh
30
dsh
– Scala
– Model based
– Trivial to extend
– Specific to OUR needs
– Meaningful defaults
– Prevents mistakes
31
val filter = FilterBuilder.withCluster(“zk”).
withOnlyRunningInstances.build()
val instances = deployment.connect.describeInstances(filter)
instances.par.foreach {
instance =>
val ssh = instance.connectSSH
ssh.execute(“sudo service api restart”)
}
What would we do differently next time?
32
Upgrade the system less monolithic
Don’t ask UI developers do operations
Clearer guidelines on managers & operations
Next Experiments
33
Divide up big rotation
Bring India development team into rotation
Switch from 24/7 shifts to 12/7
Deploy smaller parts of the system more often
Bring full-time operations people into the mix
Thank You!
34
Christian Beedgen
@raychaser
Stefan Zier
@stefanzier
We’re hiring!
go.sumologic.com/jobs

More Related Content

What's hot

DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOpsDOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOpsGene Kim
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote Gwen (Chen) Shapira
 
Kubernetes Administration Certification Cost-Register Now(7262008866)
Kubernetes Administration Certification Cost-Register Now(7262008866)Kubernetes Administration Certification Cost-Register Now(7262008866)
Kubernetes Administration Certification Cost-Register Now(7262008866)Novel Vista
 
Evolving DevOps in the Age of Cloud Native
Evolving DevOps in the Age of Cloud NativeEvolving DevOps in the Age of Cloud Native
Evolving DevOps in the Age of Cloud NativeVMware Tanzu
 
Painless DevSecOps: Building Security Into Your DevOps Pipeline
Painless DevSecOps: Building Security Into Your DevOps PipelinePainless DevSecOps: Building Security Into Your DevOps Pipeline
Painless DevSecOps: Building Security Into Your DevOps PipelineTasktop
 
How to Avoid Cloud Confusion, DevOps dilemma, Microservice Madness
How to Avoid Cloud Confusion, DevOps dilemma, Microservice MadnessHow to Avoid Cloud Confusion, DevOps dilemma, Microservice Madness
How to Avoid Cloud Confusion, DevOps dilemma, Microservice MadnessBMK Lakshminarayanan
 
AWS and Dynatrace: Moving your Cloud Strategy to the Next Level
AWS and Dynatrace: Moving your Cloud Strategy to the Next LevelAWS and Dynatrace: Moving your Cloud Strategy to the Next Level
AWS and Dynatrace: Moving your Cloud Strategy to the Next LevelDynatrace
 
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesAI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesDynatrace
 
DOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the Enterprise
DOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the EnterpriseDOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the Enterprise
DOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the EnterpriseGene Kim
 
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOpsWinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOpsWinOps Conf
 
Microsoft Azure DevOps
Microsoft Azure DevOpsMicrosoft Azure DevOps
Microsoft Azure DevOpstdc-globalcode
 
DevOps and the Importance of Single Source Code Repos 
DevOps and the Importance of Single Source Code Repos DevOps and the Importance of Single Source Code Repos 
DevOps and the Importance of Single Source Code Repos Perforce
 
DevSecCon Asia 2017 Fabian Lim: DevSecOps in the government
DevSecCon Asia 2017 Fabian Lim: DevSecOps in the governmentDevSecCon Asia 2017 Fabian Lim: DevSecOps in the government
DevSecCon Asia 2017 Fabian Lim: DevSecOps in the governmentDevSecCon
 
Chaos Engineering with Containers
Chaos Engineering with ContainersChaos Engineering with Containers
Chaos Engineering with ContainersC4Media
 
Back To Basics
Back To BasicsBack To Basics
Back To Basicskamalikamj
 

What's hot (20)

DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOpsDOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
DOES SFO 2016 - Scott Willson - Top 10 Ways to Fail at DevOps
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Kubernetes Administration Certification Cost-Register Now(7262008866)
Kubernetes Administration Certification Cost-Register Now(7262008866)Kubernetes Administration Certification Cost-Register Now(7262008866)
Kubernetes Administration Certification Cost-Register Now(7262008866)
 
Devops the Microsoft Way
Devops the Microsoft WayDevops the Microsoft Way
Devops the Microsoft Way
 
Why to docker
Why to dockerWhy to docker
Why to docker
 
Evolving DevOps in the Age of Cloud Native
Evolving DevOps in the Age of Cloud NativeEvolving DevOps in the Age of Cloud Native
Evolving DevOps in the Age of Cloud Native
 
Painless DevSecOps: Building Security Into Your DevOps Pipeline
Painless DevSecOps: Building Security Into Your DevOps PipelinePainless DevSecOps: Building Security Into Your DevOps Pipeline
Painless DevSecOps: Building Security Into Your DevOps Pipeline
 
How to Avoid Cloud Confusion, DevOps dilemma, Microservice Madness
How to Avoid Cloud Confusion, DevOps dilemma, Microservice MadnessHow to Avoid Cloud Confusion, DevOps dilemma, Microservice Madness
How to Avoid Cloud Confusion, DevOps dilemma, Microservice Madness
 
Enterprise DevOps
Enterprise DevOpsEnterprise DevOps
Enterprise DevOps
 
AWS and Dynatrace: Moving your Cloud Strategy to the Next Level
AWS and Dynatrace: Moving your Cloud Strategy to the Next LevelAWS and Dynatrace: Moving your Cloud Strategy to the Next Level
AWS and Dynatrace: Moving your Cloud Strategy to the Next Level
 
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesAI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
 
DOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the Enterprise
DOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the EnterpriseDOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the Enterprise
DOES SFO 2016 - Daniel Perez - Doubling Down on ChatOps in the Enterprise
 
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOpsWinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
WinOps Conf 2016 - Matteo Emili - Development and QA Dilemmas in DevOps
 
Microsoft Azure DevOps
Microsoft Azure DevOpsMicrosoft Azure DevOps
Microsoft Azure DevOps
 
DevOps and the Importance of Single Source Code Repos 
DevOps and the Importance of Single Source Code Repos DevOps and the Importance of Single Source Code Repos 
DevOps and the Importance of Single Source Code Repos 
 
DevSecOps OWASP
DevSecOps OWASPDevSecOps OWASP
DevSecOps OWASP
 
DevSecCon Asia 2017 Fabian Lim: DevSecOps in the government
DevSecCon Asia 2017 Fabian Lim: DevSecOps in the governmentDevSecCon Asia 2017 Fabian Lim: DevSecOps in the government
DevSecCon Asia 2017 Fabian Lim: DevSecOps in the government
 
Chaos Engineering with Containers
Chaos Engineering with ContainersChaos Engineering with Containers
Chaos Engineering with Containers
 
Back To Basics
Back To BasicsBack To Basics
Back To Basics
 
Devops architecture
Devops architectureDevops architecture
Devops architecture
 

Viewers also liked

Log Analysis @ Outsmart Games
Log Analysis @ Outsmart GamesLog Analysis @ Outsmart Games
Log Analysis @ Outsmart GamesNathan Smith
 
10 steps for growing and scaling your start up business
10 steps for growing and scaling your start up business10 steps for growing and scaling your start up business
10 steps for growing and scaling your start up businessAnuj R KHANNA
 
Scaling up ppt
Scaling up pptScaling up ppt
Scaling up pptjensonr
 
How to scale your Startup
How to scale your Startup How to scale your Startup
How to scale your Startup Asif Ali
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Logging & Metrics with Docker
Logging & Metrics with DockerLogging & Metrics with Docker
Logging & Metrics with DockerStefan Zier
 
Growth vs. Scale: Business Strategy, Product Mix, Business Growth Strategy
Growth vs. Scale: Business Strategy, Product Mix, Business Growth StrategyGrowth vs. Scale: Business Strategy, Product Mix, Business Growth Strategy
Growth vs. Scale: Business Strategy, Product Mix, Business Growth StrategyRoland Frasier
 
40 Things Every Start-Up Should Do To Scale Up
40 Things Every Start-Up Should Do To Scale Up40 Things Every Start-Up Should Do To Scale Up
40 Things Every Start-Up Should Do To Scale UpHappy Marketer
 

Viewers also liked (8)

Log Analysis @ Outsmart Games
Log Analysis @ Outsmart GamesLog Analysis @ Outsmart Games
Log Analysis @ Outsmart Games
 
10 steps for growing and scaling your start up business
10 steps for growing and scaling your start up business10 steps for growing and scaling your start up business
10 steps for growing and scaling your start up business
 
Scaling up ppt
Scaling up pptScaling up ppt
Scaling up ppt
 
How to scale your Startup
How to scale your Startup How to scale your Startup
How to scale your Startup
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
 
Logging & Metrics with Docker
Logging & Metrics with DockerLogging & Metrics with Docker
Logging & Metrics with Docker
 
Growth vs. Scale: Business Strategy, Product Mix, Business Growth Strategy
Growth vs. Scale: Business Strategy, Product Mix, Business Growth StrategyGrowth vs. Scale: Business Strategy, Product Mix, Business Growth Strategy
Growth vs. Scale: Business Strategy, Product Mix, Business Growth Strategy
 
40 Things Every Start-Up Should Do To Scale Up
40 Things Every Start-Up Should Do To Scale Up40 Things Every Start-Up Should Do To Scale Up
40 Things Every Start-Up Should Do To Scale Up
 

Similar to Scaling a Start-up DevOps team to 10x while scaling the system 50x

DevOps-driving-blind
DevOps-driving-blindDevOps-driving-blind
DevOps-driving-blindPaul Peissner
 
PMI Thailand: DevOps / Roles of Project Manager (20-May-2020)
PMI Thailand:   DevOps / Roles of Project Manager (20-May-2020)PMI Thailand:   DevOps / Roles of Project Manager (20-May-2020)
PMI Thailand: DevOps / Roles of Project Manager (20-May-2020)Gonzague PATINIER
 
Secrets of High Performing Report Development Teams
Secrets of High Performing Report Development TeamsSecrets of High Performing Report Development Teams
Secrets of High Performing Report Development TeamsSenturus
 
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith....NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...NETFest
 
.NET Fest 2018. Леонид Молотиевский. Как выжить с микросервисами
.NET Fest 2018. Леонид Молотиевский. Как выжить с микросервисами.NET Fest 2018. Леонид Молотиевский. Как выжить с микросервисами
.NET Fest 2018. Леонид Молотиевский. Как выжить с микросервисамиNETFest
 
Desmistificando Tecnologias
Desmistificando TecnologiasDesmistificando Tecnologias
Desmistificando TecnologiasJuliano Martins
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for SpeedCapgemini
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015Shannon Lietz
 
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018DevOpsGroup
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015Shannon Lietz
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdfBoreVishnusai
 
Introduction to DevOps slides-converted (1).pptx
Introduction to DevOps slides-converted (1).pptxIntroduction to DevOps slides-converted (1).pptx
Introduction to DevOps slides-converted (1).pptxaasssss1
 
Sustaining Your Career
Sustaining Your CareerSustaining Your Career
Sustaining Your CareerScott Lowe
 
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y KubernetesIntroducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y KubernetesSUSE España
 
Sviluppare velocemente applicazioni sicure con SUSE CaaS Platform e SUSE Manager
Sviluppare velocemente applicazioni sicure con SUSE CaaS Platform e SUSE ManagerSviluppare velocemente applicazioni sicure con SUSE CaaS Platform e SUSE Manager
Sviluppare velocemente applicazioni sicure con SUSE CaaS Platform e SUSE ManagerSUSE Italy
 
140910-doverick-agile103.pdf
140910-doverick-agile103.pdf140910-doverick-agile103.pdf
140910-doverick-agile103.pdfmiaoli35
 

Similar to Scaling a Start-up DevOps team to 10x while scaling the system 50x (20)

DevOps-driving-blind
DevOps-driving-blindDevOps-driving-blind
DevOps-driving-blind
 
PMI Thailand: DevOps / Roles of Project Manager (20-May-2020)
PMI Thailand:   DevOps / Roles of Project Manager (20-May-2020)PMI Thailand:   DevOps / Roles of Project Manager (20-May-2020)
PMI Thailand: DevOps / Roles of Project Manager (20-May-2020)
 
Secrets of High Performing Report Development Teams
Secrets of High Performing Report Development TeamsSecrets of High Performing Report Development Teams
Secrets of High Performing Report Development Teams
 
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith....NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
.NET Fest 2019. Eran Stiller. 6 Lessons I Learned on My Journey from Monolith...
 
.NET Fest 2018. Леонид Молотиевский. Как выжить с микросервисами
.NET Fest 2018. Леонид Молотиевский. Как выжить с микросервисами.NET Fest 2018. Леонид Молотиевский. Как выжить с микросервисами
.NET Fest 2018. Леонид Молотиевский. Как выжить с микросервисами
 
Desmistificando Tecnologias
Desmistificando TecnologiasDesmistificando Tecnologias
Desmistificando Tecnologias
 
The Need for Speed
The Need for SpeedThe Need for Speed
The Need for Speed
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
 
DevSecCon Keynote
DevSecCon KeynoteDevSecCon Keynote
DevSecCon Keynote
 
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
DevOpsGuys Scaling DevOps @ #CIOWaterCooler - June 2018
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Introduction to DevOps slides.pdf
Introduction to DevOps slides.pdfIntroduction to DevOps slides.pdf
Introduction to DevOps slides.pdf
 
Introduction to DevOps slides-converted (1).pptx
Introduction to DevOps slides-converted (1).pptxIntroduction to DevOps slides-converted (1).pptx
Introduction to DevOps slides-converted (1).pptx
 
Sustaining Your Career
Sustaining Your CareerSustaining Your Career
Sustaining Your Career
 
OpenDevOps 2019 - Disconnected pipelines the missing link
OpenDevOps 2019 - Disconnected pipelines the missing linkOpenDevOps 2019 - Disconnected pipelines the missing link
OpenDevOps 2019 - Disconnected pipelines the missing link
 
From SOA to MSA
From SOA to MSAFrom SOA to MSA
From SOA to MSA
 
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y KubernetesIntroducción a Microservicios, SUSE CaaS Platform y Kubernetes
Introducción a Microservicios, SUSE CaaS Platform y Kubernetes
 
Sviluppare velocemente applicazioni sicure con SUSE CaaS Platform e SUSE Manager
Sviluppare velocemente applicazioni sicure con SUSE CaaS Platform e SUSE ManagerSviluppare velocemente applicazioni sicure con SUSE CaaS Platform e SUSE Manager
Sviluppare velocemente applicazioni sicure con SUSE CaaS Platform e SUSE Manager
 
140910-doverick-agile103.pdf
140910-doverick-agile103.pdf140910-doverick-agile103.pdf
140910-doverick-agile103.pdf
 

Recently uploaded

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Recently uploaded (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

Scaling a Start-up DevOps team to 10x while scaling the system 50x

  • 1. Scaling A Start-up DevOps Team To 10x While Scaling The System 50x Christian Beedgen – Co-Founder & CTO Stefan Zier – Lead Architect DevOpsDays Austin 2014
  • 2. Christian Beedgen – Co-Founder, CTO – ArcSight, Amazon, … – No prior experience running production systems Stefan Zier – Lead Architect, first engineer – ArcSight, Amazon,… – No prior experience running production systems Intro 2
  • 3. 3 Scaling Spreading constructive beliefs and behavior from the few to the many. Robert I. Sutton Scaling up Excellence: Getting to More Without Settling for Less
  • 4. 4
  • 5. Petabyte scale log management platform Big Data™, High Velocity, Human Real Time Distributed 100% in AWS Service Oriented Architecture 99% in Scala Run by engineers The Sumo Logic Service 5
  • 8. Engineering Head Count Sumo Logic Confidential8 0 10 20 30 40 50 60
  • 9. The Challenge 9 Scaling Sumo Logic – More confidence and uptime – More operators – More change – More services
  • 10. 10
  • 11. DevOps Culture Spreading Knowledge Control surfaces How We Scaled 11
  • 12. 12 Culture a shared, learned, system of values, beliefs and attitudes that shapes and influences perception and behavior — an abstract “mental blueprint” or “mental code.”
  • 13. One week, 24/7 responsibility for – Operational decision making – Alert response – Deploying the bits – Configuration changes Pair of people (primary, secondary) – Social schedules & travel – Training – Relief after a noisy night Being On Call 13
  • 14. Sumo on Sumo – Perfect dog fooding use case Post mortems – Drive improvements from incidents Alerting – Code I wrote yesterday just woke me up at 4am Feedback Loops 14
  • 15. Mandated for PCI compliance – Change Management Board = Channel on Slack – Change Request = JIRA ticket – Audit trail = Paste slack conversation into JIRA Actually helpful – Good documentation – Starts good discussions – Makes change mindful Change Management 15
  • 17. Tactical – Daily Standups – Chat – Playbooks Strategic – Mentoring – “How the sausage is made” sessions – Checklists Spreading Knowledge 17
  • 18. 18
  • 19. Playbooks 19 Linked to alert – GitHub wikis – URL in alert Focused on MTTR – Steps to restore service – List of Subject Matter Experts to call Continuously improved – Boy Scout rule
  • 21. Checklists 21 Improve outcomes – Ensure experts don’t miss any critical steps – Prevent repeating mistakes Well designed – Coherent – Living documents – Concise, clear and require specific actions – Need to be short and well-organized – Are NOT step-by-step instructions
  • 22. 22
  • 23. 23
  • 24. DevOps Friendly 24 Control Surfaces matter for scale – Simplify complex operations – Consistent view – Built-in safety Natural to use – Easy to learn, discover Natural to extend – Every developer
  • 25. 25
  • 26. dsh 26 dsh – CLI – Full stack – Fast – Safe – Secure – Proactive – Discoverable
  • 27. Model Driven 27 Creates consistency Provides guard rails Deployment – Cluster • Instance – Assembly Configured at all levels
  • 29. 29
  • 30. dsh 30 dsh – Scala – Model based – Trivial to extend – Specific to OUR needs – Meaningful defaults – Prevents mistakes
  • 31. 31 val filter = FilterBuilder.withCluster(“zk”). withOnlyRunningInstances.build() val instances = deployment.connect.describeInstances(filter) instances.par.foreach { instance => val ssh = instance.connectSSH ssh.execute(“sudo service api restart”) }
  • 32. What would we do differently next time? 32 Upgrade the system less monolithic Don’t ask UI developers do operations Clearer guidelines on managers & operations
  • 33. Next Experiments 33 Divide up big rotation Bring India development team into rotation Switch from 24/7 shifts to 12/7 Deploy smaller parts of the system more often Bring full-time operations people into the mix
  • 34. Thank You! 34 Christian Beedgen @raychaser Stefan Zier @stefanzier We’re hiring! go.sumologic.com/jobs

Editor's Notes

  1. Founders and initial team all back end Java devs
  2. Organically grown, possibly unique to us.May give you ideas.
  3. Learned. You become encultured when you join Sumo.2) Shared by the members of the on-call rotation.3) Patterned. People in the rotation live and think in ways that form definite patterns.4) Mutually constructed through a constant process of social interaction.5) Internalized. Habitual. Taken-for-granted. Perceived as “natural.”Examples of our culture.
  4. We like feedback loops.
  5. Members chosen based on track record.Theres no meetings. 24/7 CMB session Quick and frictionless.
  6. How to you learn what you need to know, then stay in the picture?
  7. Tactical: What’s going on with the system NOW?Strategic: What do I need to know to run the system?
  8. Health checks embedded in the codeRequire a playbook for every alertDocumentation “unhealthy when”Side effects:Force meaningful alerts
  9. Example: Doctors leave clamps in patients. Used in other industries with great success (pilots, doctors) AtulGawande – Checklist ManifestoNeed to be well-managedFocus on the 80%Coherent = edited by 1 person, with suggestions from everybody
  10. Create Sectionsand describe when they matterSometimes include reminders of when to do non-obvious thingsChecklists we use regularlyGA readinessDeploy to productionGetting ready for on-call rotationOn-call handover
  11. The interfaces DevOps touch and interact withTurns out, they matter.
  12. Good control surfaces help scalingHelp learningHelp automatingSo… what’s do backend developers like. Uis? Mice? No.
  13. They’re good with CLIs.But CLIs have to be good and easy to learn.
  14. Our internal orchestration tool is called dsh. It’s a CLI. Does the full stack. Uses a really nice readline prompt (jline) with tab completion, history, all the stuff bash has. Uses threading aggressively to make things fast.Has lots of built-in safe guards. We learned from our mistakes. Encourages good security practices. Example: Integration with IronKeys. Proactive – check proactively for things that may cause things to fail. Example: AWS instance limit
  15. Forces users to do the right thing in a standardized way.
  16. This command performs a rolling restart of the api and receiver assemblies. Here’s what happens behind the scenes: We load the model and find out which account the deployment is in. We load the credentials for that account from the IronKeyConsult the model to find out which clusters run api and receiver. Use AWS API to query for the list of instances running in those clusters (using tag query). Query an external service for our own IP address. Use AWS API to query security group. If our IP address isn’t included, add it. Calculate what 25% of API and 10% of receiver amounts to. Launch a thread pool with the correct number of threads.SSH into the machines.Run the script that restarts the daemon. Check Zookeeper and wait for the daemon to be back in service. If applicable, wait for ELB to show the instance healthy again. Gather any error messages.
  17. Started out as developers, chose Scala since it was most natural. Model of deployments, clusters, instances, other AWS resources. Adding new commands is REALLY easy. The model is deeply engrained and omnipresent. Some of the functionality is aware of our application code.Use defaults to manage how you want ops to behave. Special safeguards for production deployments. Make any mistake exactly ONCE. – Example – don’t allow deleting EBS volumes in prod. Don’t allow deploying SNAPSHOT builds to prod.
  18. Example of how our model interacts with AWS and Scala. Worth noting how you can easily interact with the model without knowing much about the guts.
  19. As we scale the team further, we will keep on experimenting.