The decision and process behind rewriting or re-architecting a system is often plagued with a series of problems: people always underestimate the complexity, people never fully understand the customers, system requirements constantly change out from under them, and, in almost all cases, it takes much longer than anybody can predict. As part of this workshop, we’ll look at a couple of case studies of re-architecture to gleam strategies of success from them as well as common pitfalls to avoid. This workshop should arm you with a framework by which to approach your own decisions around how to manage, maintain, and evolve your own systems:
* understanding the underlying motivations;
* developing a method for deciding whether to evolve or to rewrite;
* managing the engineering effort of re-architecture in the midst of a changing business;
* setting up metrics to understand whether you’re on course; and
* organizing the engineering team and the culture to ensure success
4. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
SO… MY APOLOGIES
5. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
I BELIEVE IN “FULL-CONTACT”
PRESENTATIONS
6. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Re-architecting on the fly
The decision and process behind rewriting or re-architecting
a system is often plagued with a series of problems: people
always underestimate the complexity, people never fully
understand the customers, system requirements constantly
change out from under them, and, in almost all cases, it takes
much longer than anybody can predict. As part of this workshop,
we’ll look at a couple of case studies of re-architecture to
gleam strategies of success from them as well as common
pitfalls to avoid. This workshop should arm you with a
framework by which to approach your own decisions around how
to manage, maintain, and evolve your own systems.
20. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
THE EXERCISE
21. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
The Rules
Break up into teams of a few people
Take the following case study
discuss
dissect
ask questions
Play the role of architects and also engineering leads
Volunteers to present when we’re done
23. Twitter already had some availability issues
earlier this week, leading to a complete
outage of about 90 minutes on Wednesday
and more fail whales popping up yesterday.
24. What we didn’t anticipate was some of the
complexities that have been inherent in
fixing and optimizing our systems before
and during the event.
25. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
…and the team?
We can’t ship things fast enough!
‘Velocity’ is going down! - Management
This code looks like crap! - Engineering
26. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Some hints as to what to do
You all are not just an architects, you are engineering leaders
How do you get a plan in place?
How do you figure out your first step?
What’s the execution of the plan look like?
Then…. what?
(Yes, this is vague. I’ll be floating around the room - ask
me questions!)
27. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
DO MY FORMER JOB
35. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Not servicing customers
Not scaling to incoming traffic, usage, users, etc.
But also not matching in features that users want
HipChat architected for Amazon Web Services
customer demand to move it to Enterprise
lacking higher level services
lots smaller machines vs small number of bigger machines
38. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
It’s easier to write code than it’s to read code
We're programmers. Programmers are, in their
hearts, architects, and the first thing they want to do
when they get to a site is to bulldoze the place flat
and build something grand. We're not excited by
incremental renovation: tinkering, improving,
planting flower beds.
- Joel Spolsky
39. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
It’s easier to write code than it’s to read code
Unless properly tended, real world code becomes
complicated over time
energy goes into bug fixes
every week in production means the code is gathering
bandaids, & experiences
Code reviews should be (partially) used to structure
software for “readability”
40. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Technical debt
/**
* For the brave souls who get this far: You are the chosen ones,
* the valiant knights of programming who toil away, without rest,
* fixing our most awful code. To you, true saviors, kings of men,
* I say this: never gonna give you up, never gonna let you down,
* never gonna run around and desert you. Never gonna make you cry,
* never gonna say goodbye. Never gonna tell a lie and hurt you.
*/
http://stackoverflow.com/questions/184618/
what-is-the-best-comment-in-source-code-you-have-ever-encountered
41. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Technical debt
//
// Dear maintainer:
//
// Once you are done trying to 'optimize' this routine,
// and have realized what a terrible mistake that was,
// please increment the following counter as a warning
// to the next guy:
//
// total_hours_wasted_here = 42
//
http://stackoverflow.com/questions/184618/
what-is-the-best-comment-in-source-code-you-have-ever-encountered
42. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
You haven’t paid it down
Technical debt (also known as design debt or code debt) is a
recent metaphor referring to the eventual consequences of poor
system design, software architecture or software development
within a codebase. The debt can be thought of as work that
needs to be done before a particular job can be considered
complete or proper. If the debt is not repaid, then it will keep
on accumulating interest, making it hard to implement
changes later on. Unaddressed technical debt increases software
entropy.
- Wikipedia
43. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
The causes
BUSINESS PRESSURES - the business considers getting something released sooner
before all of the necessary changes are complete. That code is shipped “unfinished”.
LACK OF PROCESS - development is not well managed. Big, long-lived, & isolated
changes are in flight & merging those changes becomes really difficult. Usually, there
is a lack of test suite, which encourages quick & risky band-aids to fix bugs.
ARCHITECTURE - modularity hasn’t been managed, & the software is not flexible enough
to adapt to changes in business needs. As the requirements evolve, the code proves
unwieldy & must be refactored in order to support future requirements. The longer
that refactoring is delayed, & the more code is written to use the current form, the
more debt that piles up that must be paid at the time the refactoring is finally done.
LACK OF ENGINEERING MENTORSHIP - knowledge isn't shared around the organization,
business efficiency suffers, or junior developers are not properly mentored. Possibly,
the wrong engineers have been hired as well.
44. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
More, & more, & more
Time seems to dilate
each change is taking longer than we think
each change is turning out to be harder than we think
Each change results in a cascade of new defects
This causes your team to enter a vicious spiral
50. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
History of Word
The time is October 5, 1991
shipped award winning Mac Word 5.0
Win Word 2.0 shipped in 1991 as well with more features
Business problem is emerging - WordPerfect is eating up part
of the market
Technical problem - Mac Word and Win Word are different
codebases
Proposed solution? Project Pyramid.
51. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Pyramid was cancelled
Microsoft decided that “taking a few steps back” for Pyramid would
cause them to lose to WordPerfect
New strategy, target Mac Word from the Win Word codebase
Huge problems
Windowing is handled fundamentally differently between the operating
systems
68K Mac OS doesn’t do memory management
compiled with a beta version of Visual C++ 2.0 (so beta that most
optimizations had to be turned off)
In the end - Mac Word 6.0 wasn’t “Mac-like”
52. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Apache HTTP
1996
1997
1998
1999
2000
2001
2002
Whispers of
Apache 2
Apache Week talks about
filtering, & difficulty in
implementation. Need
multi-threading
Huge rewrite
planned for post 1.2
Decide same
codebase for all
releases in 2.0
Halt features
in pre 2.0
2.0 alpha
SSL in 1.3 & 2.0
2.0 beta
2.0 GA
Planning in
place
54. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
EC2 performance is sad
Customers demand exceptional performance & always-on
availability
experienced networking issues
hanging VM instances
unpredictable performance degradation (probably due to
noisy neighbors)
Spending more time working around these issues rather
than features
55. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Migrating SwiftType
Nothing fundamentally wrong with the architecture, just not
suitable for a zero downtime migration
Need connectivity between EC2 and the new datacenter
Migrate data
stateless services are not a problem
choose whether the backend services are replicated or deal with
inter-datacenter latency
build custom replication services for backends that can’t do it
natively (search)
56. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
What happens during a rewrite?
Business stays still or business moves on
Have to implement accrued business logic in a fraction of time
old code is used
old code is tested
lots of bugs have been found and fixed
The old system can’t be turned off until the new system is at
100% parity (or the company agrees to turn off the old)
This will ABSOLUTELY take a lot longer than anybody thinks!
BAD!
BAD!
57. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
What about non-code issues?
You’re going to gather seriously unhappy customers
external - no features!
internal - things are being held up
Political battles within engineering
Excessive frustrations because deadlines
will almost certainly be missed
NOBODY EVER TAKES INTO
ACCOUNT THIS STUFF
61. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
NEVER to do one if you can get away with it
It will take longer than you think
The market, product, & business will change while your
development is in flight
Existing customers will become frustrated
YOU DON’T CONTROL THE REWRITE,
IT CONTROLS YOU
63. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
SO, I HAVEN’T TALKED YOU
OUT OF IT YET
64. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Deep philosophical question
Are you better than those that came before you?
If you have the same team, there is NO GUARANTEE that
your team now will do better the second time through
If you have a different team, there is NO GUARANTEE that
your team now will do better than the last team
67. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
This isn’t big, right?
How good is your documentation?
Do you even know all the corner cases?
Systems get stronger by being in production
they accumulate bug fixes, bandaids, operational
knowledge
how well do you know all those?
can you reproduce all those?
73. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
ALL THE THINGS TO PLAN FOR
74. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Functionality
Do you REALLY know what the system does?
all the corner cases of the product?
“make it do what it already does” is harder than you think
Most programmers don’t know what to ask!
doubly true if they weren’t the original designers of the system
even if they are the original designers, no way they remember
all the corner cases
Can code serve as the spec?
PROBABLY
NOT!
76. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Feature creep
This is incredibly tempting, especially if feature
development is halting on the old system
It will potentially kill you
YOU TRY NOT TO DO THIS DURING REGULAR
DEVELOPMENT, SO, WHY START NOW?
78. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Flexibility breeds complexity
If you try to build a system
that is flexible, you’ll
probably get a system that is
complicated
If you build a system that is
simple, you may be able to
build something that is
flexible onto of it
79. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Aside: The Unix Pipeline
program1 | program2 | program3
A set of processes chained by their standard streams.
The output of each process (STDOUT) feeds directly as
input (STDIN) to the next one
Very simple idea evolves into component based software
engineering
STDOUT
STDIN
80. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
The different stages of software
IRRELEVANT
“OLD SCHOOL”
MAINSTREAM
EARLY ADOPTER
IDEA TODAY
81. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Nothing works as expected
1. TECHNOLOGY TRIGGER - proof-of-concept stories &
media interest trigger significant publicity. No usable
products exist & commercial viability is unproven.
2. PEAK OF INFLATED EXPECTATIONS - some success & lots
of failures. Few companies take action; many do not.
3. TROUGH OF DISILLUSIONMENT - experiments &
implementations fail. Investments continue only if
the surviving providers improve their products.
4. SLOPE OF ENLIGHTENMENT - instances of the
technology’s benefits start to crystallize & become
more widely understood. 2nd- & 3rd-gen products
appear from providers. More pilots funded;
conservative companies remain cautious.
5. PLATEAU OF PRODUCTIVITY - mainstream adoption
starts to take off. The technology's broad market
applicability & relevance are clearly paying off.
1 2 3 4 5
83. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
The 5 stages of grief
1. ELATION - yay! we get to throw out that crappy product! we
get to start it over, & do it the right way!
2. BUCKLE DOWN - wow, there is a lot of work here.
3. OH SHIT - there really is a lot of work here. We’re never going
to get it done in time.
4. EXHAUSTION - we missed our deadline. Again.
5. RELIEF - we’re out the door! But… it’s crappy. We cut
corners, but, we’re done. Next time, we’re totally going to do
it right.
http://www.jamesshore.com/Blog/How-to-Survive-a-Rewrite.html
84. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
ONE LAST TRY, BECAUSE I DON’T
WANT YOU TO DO THE REWRITE!
85. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Ammunition against the rewrite
One goal of a rewrite is usually to create a cleaner
codebase
remember how learning all the edge cases is hard?
remember how writing code is easier than reading code?
to save time, a lot of developers cut and paste code from the
old system to the new system!
It’s hard to create a cleaner codebase during the rewrite!
86. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Can we quantify this somehow?
Can we make a more data driven decision?
Gives more confidence to the decision, and helps sell it
better to others
A rewrite is a massive investment, and its only fair to be
really sure before going down the road
87. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
If you don’t do anything
pstarting + (vcurrent x t)
88. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
If you are on a new system
pstarting + (vnew x (t - trewrite))
89. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
What we’re betting on
pstarting + (vcurrent x t)
<
pstarting + (vnew x (t - trewrite))
pstarting + (vnew x (t - trewrite))
90. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
What we’re betting on
For some value of t where t > trewrite
Variables to play with
vnew = that the new system has to be faster to iterate on
trewrite = that we need to min(time to write the new system)
vcurrent x t < vnew x (t - trewrite)
91. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
How do we estimate vnew
Take the new tech for a spin
Have a small team build something non-trivial
This will not be a hard science
gives you a sense of the extent of the software
gives you a sense of the capabilities of the team
Have to understand maintenance costs, scalability costs
92. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
How do we estimate trewrite
Given that you estimated vnew, use that to try to scope
out the rewrite
Given an estimate of trewrite, try to understand whether
that t is too long
what is the state of the product now?
how much time will you lose in the business?
93. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
OK, I STILL HAVEN’T TALKED
YOU OUT OF THIS
94. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Hold the line
Product managers are going to get
anxious
you’ve sold them on a great vision
they feel like you are taking too
long and blocking them
You have to manage them (or,
ignore them)
Do the smallest thing you can
each step of the way
95. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Define “done”
100% feature matching is
difficult
Implementing features is hard
remember how hard it was to
write it the first time?
you’re at a different part of
the software cycle
Try to do things incrementally
98. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Aside: The Saturn V
Von Braun says “German
engineering dictates that you
should test each part”
Worst case situation: a fireball
of 1,400’ that will burn for 35s
at 2500F
Will we make the “end of the
decade”?
George Mueller, pulled rank,
and pushed for an “all up test”
100. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Be incremental
All-up almost never works - the
Saturn V is an exception
Don’t be “lazy” and assume all-up is
easier
do an integrated approach & ready
to release at every step of the way
106. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Find the starting line
If you're going with feature parity, then do the biggest
bang part of the system first
If you can, spend a bit of time extending your runway a bit
too
Drive with data, & instrument the old system to gather
that data
107. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Twitter API
Other
Photos
Profile
Activity
Home
108. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Don’t ignore data!
Your implementations may change, but your
underlying data will change very slowly
Fake or synthetic data gives you a false sense of
security
Test with real data as soon as possible
get real data piped through the system
figure out how you are going to do reconciliations
109. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Manage tech debt better
Try to reduce scheduled pressure
Establish a culture of design quality
Start refactoring, continuous
design, & other code-quality
practices
Partition out a portion of time to
always work on technical debt
110. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Shy away from vanity stuff
Don’t let the sirens drag you to a “hot new language”
Stay away from the trap of “we could do a better job
recruiting if our stack looked like…”
Do what’s right for your team and what your team can
execute on
112. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Prepare for mounting tension
We all know that fixing bugs and firefighting is stressful
imagine one team who has to do this, & another team
who doesn’t have to worry about it
which would you like to be on?
Tension is going to build & people are going to be angry
Make friends, communicate, & be transparent
113. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Know the business
Figure out who are all the stakeholders
in the rewrite
Figure out who are the decision makers
& the ones who control your fate
they are usually not those who
complain the most
they are those who have “the
power”
Understand all the different non-
technical motivations of the
company
114. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Find your inner salesperson
You have to sell it to the business
Gather all the data
cost savings
feature iteration
reliability
performance
stability
Focus on the data. Don't use
anecdotes, but, instead, show results
of experiments.
115. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Get ready for the politics
You are in a precarious position
You are exposed
taking up engineering resources
not delivering new features for the
company
in an “open transaction”
Keep the in-flight work small and
well integrated
116. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Keep an eye on code quality!
On repeat - getting feature
parity is really hard!
To keep parity, developers,
inevitably, cut & paste code
from a previous system!
Have code quality metrics
APPROVED
117. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
Get the team ready
Organizations which design systems ... are
constrained to produce designs which are copies of
the communication structures of these organizations
- Conway’s Law
118. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
I still haven’t talked you out of this
1. Hold the line
2. Define “done”
3. Incrementalism
4. Find the starting line
5. Don’t ignore data
6. Manage tech debt better
7. Stay away from vanity stuff
8. Prepare for mounting
tension
9. Know the business
10.Get ready for politics
11.Key an eye on code quality
12.Get the team ready
126. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
The 5 stages ofgrief
1. ELATION - yay! we get to throw out that crappy product! we
get to start it over, & do it the right way!
2. BUCKLE DOWN - wow, there is a lot of work here.
3. OH SHIT - there really is a lot of work here. We’re never going
to get it done in time.
4. EXHAUSTION - we missed our deadline. Again.
5. RELIEF - we’re out the door! But… it’s crappy. We cut
corners, but, we’re done. Next time, we’re totally going to do
it right.
http://www.jamesshore.com/Blog/How-to-Survive-a-Rewrite.html
128. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
What we’re betting on
vcurrent x t < vnew x (t - trewrite)
129. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
OK, I STILL HAVEN’T TALKED
YOU OUT OF THIS
130. Raffi Krikorian / raffi.krikorian@gmail.com / 9 March 2015
Software Architecture Conference 2015
I still haven’t talked you out of this
1. Hold the line
2. Define “done”
3. Incrementalism
4. Find the starting line
5. Don’t ignore data
6. Manage tech debt better
7. Stay away from vanity stuff
8. Prepare for mounting
tension
9. Know the business
10.Get ready for politics
11.Key an eye on code quality
12.Get the team ready