Infrastructure – The Challenge of Small Ops – Part 3
Find Part 1 and Part 2 on through these links.
Infrastructure is hard to build. This is true when putting together
compute clusters, or when dealing with roads or power lines. Typically
this involves both increases in operating expenses and capital expenses,
and a small mistake can be quite costly.
Limited Resources
All organizations have goals. Sometimes these goals are built around
reliability, and sometimes they are build around budgets, but most of
the time both are important. In large organizations a few extra servers,
don’t usually carry a material cost impact, but in a small organization
one extra servers can double the cost of a project. If you’re missing a
large budget it can make some reliability goals quite challenging.
Reliability
Engineering is the art of making things as weak as they need to be to
survive. When putting together infrastructure in a small environment its
helpful to really give someone the job of reliability engineering. They
should look at your application, and outline what is required to
provide the basic redundancy your organization needs. Then they should
see how they can line up the budget, and the requirements and get you a
solution that meets your up-time needs as well as your pocketbook.
N+1
This is batted around often when discussing the redundancy. In very
large organization this n may equal 1000, so n + 1 is 1001, but in a
small organization N is often 1, making N + 1 equal to 2. This is often a
hard problem to work around when you may only be allowed to buy 2
severs for a three tiered application, but you can work around it.
Virtualization can really help out, but it increases the planning
demands. You will need to insure that you have the capacity in each
piece of physical equipment to meet your needs, and that you’ve made
sure that system roles always exist such that they can fail
independently. While this sounds simple, it really needs an owner to
keep track of this, just to make sure you don’t loose your primary and
backup service at the same time.
The second issue with n + 1 redundancy, when N equals 1 you need to
plan capacity carefully. The best solution in this case is to use an
active-passive setup. If you use active-active setups you need to be
careful that you don’t exceed 50% of your total capacity, since a
failure will remove 50% of your capacity.
Wrapping it Up
Infrastructure is one of the harder things to get right in a small org.
Take you time and think about it. Always keep an eye on your budget, and
reliability goals.
This is the final installment in this series. Take a look at Part 2 and Part 1 if you liked this article.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)





