The Food Chain of Colocation to Managed Hosting

July 11, 2010  |  blog, design  |  No Comments

Third-party services are some form of “managed services” at one of the four levels below:

Service Provider ProvidesCustomer Provides
ColocationData center services including space, bandwidth, and powerHardware and software
Colocation plus hardware platformData center services plus servers, routers, network storage, etcSoftware
Managed servicesOnsite support and specialized services that may include SAN storage, data protection options, etc.May or may not provide hardware, but does provide the software.
Managed hostingIn addition to managed services, may offer sofware as a serviceMay or may not provide some of its own software

Colocation is a 3rd party that provides space and data center services such as power, and the customer provides its own hardware and software which the customer adminsters.

Managed services is where the the service provider provides on-site support and administration services and sometimes the hardware to run the software on, while the customer provides the software.

Managed hosting provides the highest level of off-premise service provision, in which the service provider may take over all tasks to do the service, as well as providing the software. The distinction between managed services and managed hosting is not precise, and are often used interchangebly.

Restore Time and Recovery Point Objects

Restore Time and Recovery Point Objects

July 11, 2010  |  blog, design  |  No Comments

The Recovery Time Objective (RTO) for an application is the goal for how quickly you need to have that application’s information back available after an “event” has occurred that stops the application.

The Recovery Point Objective (RPO) for an application describes the point in time to which data must be restored to successfully resume processing (often thought of as time between last backup and when an “event” occurred).

The RTO and RPO metrics are useful in discussing what technologies, products, processes and procedures are required to meet those objectives. Setting the objectives should come from looking at the business impact of applications being unavailable, and the business impact of loss of data.

The interesting thing is that people tend to have a gut reaction that these go hand in hand.  Namely that you aren’t willing to lose any data nor are you willing to have any downtime.  However, during a realistic SLA assessment, you eventually have to map your RPO and RTO numbers to a cost.  

Generally speaking with a high RPO value such as a 5, it means that your solution has no use for historic data. Examples of this would be a database that aggregates data from other sources. While you may need to bring that database online quickly in the event of a disaster, you might be able to get the data from the other sources.   Another example might be a middle-ware application that participates in a load balanced farm.  Very little data is traditionally held in middleware servers and you tend to treat them as “goldfish” in most cases.  By “goldfish” I mean that you’d flush it down the toilet and go to the store for a new one.  In computer terms, I mean that you would rebuild the node and join it back into the load balanced cluster.  An example of low RPO value component would be a database that stores all sales on a B2C website.  As your RPO gets to zero, you must have a strong plan for maintaining data redundancy at a secondary site.  A value of 1 or 2 means that you a few transactions being lost in the event of a failure. 

With a high RTO value, it means that you can live without the application or service for a long period of time.  An example of a high RTO might be a server that delivers reports on a periodic basis.  Of course some may argue that these reports are essential for business and therefore it would slide the RTO value towards the center.  A server with a low RTO might be a web server farm that services online transactions for a bank.  You would want to bring that service back online quickly.

One person’s low value might be another person’s high value.  Aggregating people’s perception of the RPO and RTO values (and therefore SLA’s) can be somewhere subjective.  

As RPO’s and RTO’s move to the center of the line, the costs tend to go up.  The combination of the two values defines the mission criticality of the application or service component.  You could break this up into Business Application Criticality chart like the one below:

From the graphic above you can understand why service components with values of 1, 2 and 3 will have the highest SLA’s.

What to do?

The first step is identifying your various service components and mapping them against RTO and RPO values.  Next identify the combination of RPO and RTO values in a table and add the two numbers together.  Finally map the total against the chart above to determine your SLA requirements. 

After all this is done, the next step is to figure out where your single point of failures (SPOF’s) are and identify the level of risk associated as it relates to the SLA.  Identify each risk with a numeric value where 1 is high risk and 5 is low risk.  Usually what follows is a cost breakdown of what it will take (hardware, software and people) to remove the SPOF.  At this point, you may want to match up the risks that have low values against the cost to remediate the problem.  You will be able to determine which action item will give you the most “bang for your buck”.

 

A