How New Relic Designs for DevOps
Here at New Relic,
we’re just beginning to give official definition to our Operations team
– or as we like to call it, Site Engineering. But that’s not to say we
haven’t spent considerable time and effort in establishing the type of
team we want it to be. While collecting 51 Billion metrics per day has
provided us with some interesting challenges along the way, we haven’t
been sidetracked into building a complex infrastructure to support such a
massive undertaking. In fact, it’s our dedication to simplicity in
operations that has given us the opportunity to focus on what really matters: our users.Below is a summary of our story so far and the lessons we’ve learned. You can also view the accompanying presentation.
“Dedication to Efficiency, Getting Along on Little Power”
Operations, and the technology choices we make, shouldn’t be a burden
or distraction from our primary focus — delivering happiness to our
users. Using a well established and stable technology stack, we’re able
to maintain our infrastructure with little effort and a small amount of
resources. By making technology choices that are efficient and
uneventful, we can focus on what really matters. This helps us avoid
spending time fighting fires or establishing a complex system to prevent
them.
Coupling Our Business to Technology Choices
We want to get our application into our users’ hands as fast as
possible and get their feedback just as quickly. This means we can’t
have our business choices predicated on our technology choices. We need
to make shifts as we learn what works for our users and what doesn’t.
The same tight coupling principles we use in software development also
apply to our operations and technology stack. If you’ve implemented a
particular technology (programming language, datastore, etc.) that
requires significant reworking as you make the inevitable shifts, then
you’ve coupled too tightly.
Engineer, Don’t Administer
We want to solve our infrastructure problems through engineering the
solution, not administering it. When hiring, New Relic looks for
individuals that know how to build tools that remove pain points rather
than individuals who are experts in a particular piece of technology.
Companies and products shift over time. We want people who can ease the
friction from those shifts. That means we need generalists who take the
best from disparate areas. As we continue down the path to DevOps, we
need our operations staff to think like software developers because,
over time, our infrastructure will begin to look more and more like
software.
Operations Should Be Interesting, Not Exciting
Firefighting may look exciting in the movies. But as an operations
team, that’s not what we want to be known for. The operations world is
filled with brilliant and passionate people who are stuck fighting fires
as 3 am, instead of spending their days working on hard and interesting
problems. We want to fix that and get to a point where failure at 3 am
doesn’t wake anyone up. If we’re spending time fighting fires, we’re not
spending our time creating a great product. An operations team can be
building self-healing and self-organizing infrastructures. Or it can be
resolving database replication problems in the middle of the night.
Which would you rather be working on?
Be Deliberate
We are deliberate when making choices about our operations processes
and technologies. In the same way that ancient tools found in a buried
city tell us something about that culture, our tool choices
tell us something about ours. We look for tools that are mature and
have a healthy ecosystem around them. We want to understand our tools
intimately. We’re going to push them to their limits and want to know
how to improve them as they start to show their stress points. We want
tools that we can easily integrate into our world and swap them out when
needed. We are extremely considerate of our culture and choose our
technology with the same level of consideration. What do your tools and
processes say about you?
Optimize for Discovery
Our technology and process choices have aligned us to be optimized for
discover. We want to get our products out to our users quickly and the
barriers standing in the way of that are broken down first. Implementing
a continuous delivery pipeline wasn’t simple nor easy, but now our
users can see improvements and features rapidly. Feature flags allow us
to roll out features incrementally, to A/B test them and quickly iterate
on ideas. As we work on complex systems, we know something will
inevitably break. We want a resilient infrastructure where our MTTR is
small and MTBF isn’t even a consideration.
Every employee at New Relic is behind our build, measure, iterate cycle and operations is at the center for this process. Without careful cultivation of our DevOps culture, we wouldn’t be optimized for discovery.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)




