Pitfall #1: Monitoring Components vs. Services
The Scenario
Once, I was asked by a large manufacturing company to do an assessment of their ITIL practices, including
SLM. Upon my first visit to their offices, throughout various IT departments, these extensive SLA reports were
on display, printed large and hanging on their walls.
The network team had reports detailing network performance metrics. The server group had reports on server
uptime. This company had invested significant money and resources in these SLA monitoring capabilities, but
they didn’t specifically address critical services for the company, for example the service levels of their
manufacturing automation system.
Ultimately, business management didn’t feel like end users were experiencing an improvement in service levels,
in spite of the huge investments. Further, IT management suffered from a loss in credibility. They came into
meetings with business management and delivered reports that demonstrated glowing numbers for
performance and uptime. But those numbers didn’t jibe with what end users were actually experiencing.
The Solution
Fundamentally, this scenario illustrates a failure in service definition, and it’s a very common pitfall. From the
beginning, you need to involve representatives from the business, or representatives from IT who act as
liaisons to business management, to determine what the business priorities are. Based on these priorities, the
services that need to be monitored should be defined.
As part of this process, requirements and thresholds need to be negotiated as well. For example, in defining
service level reports, one IT organization assumed that the business required a greater degree of uptime than
they actually did. The result? They spent a great deal on monitoring and infrastructure enhancements that were not necessary. Similarly, IT needs to be cautious of over committing, especially early on in this process, and
setting groups or individuals up for failure.
Engage with business representatives, early and often. Ultimately, by doing so, you can ensure that there’s a
common language for evaluating performance, so priorities get addressed, and IT can most effectively
demonstrate the value it delivers.
Another important consideration has to do with reporting, and tailoring the delivery of information to the
intended audience. At a high level, there are three key audiences for SLA reports: the business, the IT
management responsible for reporting to the business, and the IT management responsible for OLAs. Having
an understanding of which audience is going to be reviewing a given report is essential to ensuring the
effectiveness of that communication.
In the manufacturing company scenario described above, if the IT group had involved business representatives
early on, they would have realized that the uptime or performance metrics of any specific system, whether
database servers or network components, wasn’t what was important to that audience. What was important
were the reliability and responsiveness being delivered by core business services that were counted on by users
every day. On the other hand, having robust monitoring on the performance and uptime of specific elements is
critical for those IT managers responsible for OLAs.
The Monitoring Requirements
The capabilities of monitoring tools can play a vital role in service definition. If you run simplistic monitoring,
such as pinging, you may see that the machine is up, but that doesn’t necessarily ensure that the machine is
doing what the business needs it to. Look for solutions that move beyond simple up/down monitoring to offer
more extensive insights into the health and performance of specific infrastructure elements.
Further, look for solutions that offer insights beyond specific elements that underpin a given service to enable
service level monitoring. This requires an ability to aggregate silo-centric data to obtain a service-centric
perspective—including capabilities for consolidating silo data, processing that silo data in a unified fashion
against service-level metrics, and displaying service quality metrics. In addition, it is important that an end-to-end
infrastructure view is provided that enables administrators to quickly detect and isolate any events or
outages that can affect end user performance.
Finally, gaining visibility into the actual response times end users experience is the ultimate gauge for
performance, both for system and service level monitoring. Look for solutions that can do synthetic simulations
of transactions in order to provide accurate insights into availability and performance from the end user’s
perspective.
Pitfall #2: Generating Volumes of Reports that Don’t Provide Service Insights
The Scenario
In response to regulatory requirements like HIPAA and SOX, a large medical company began generating lots of
service level reporting data. In highly regulated companies, management will often demand that massive
volumes of reports get generated, with the reasoning being that reporting on every potential variable will help
demonstrate compliance and ensure audits are passed. In fact, these regulations generally don’t specify that
service level reporting is required at all.
In this case, the company’s IT organization delivered reports that addressed service levels, and addressed
almost every possible variable, but ultimately too much information was getting handed over to line managers.
In one case, huge reports were being delivered to the DBA manager, but these reports didn’t offer some means
of getting executive level summaries, so neither she nor anyone on her team had the time to review those reports. Ultimately, these reports weren’t empowering the DBA manager to get real insights for improving
service levels, reducing costs, or streamlining processes.
The Solution
Often regulatory mandates tend to provide high level requirements, leaving a lot of flexibility in terms of
interpreting how to apply them to the specifics of your organization. In reality, initiatives for SLM can be
instrumental in supporting regulatory requirements, while also being orchestrated to meet the needs of the
business.
Effective service level reporting can be instrumental in helping maintain control over regulated health care data
and control over the systems that manage that data, so they can help attest to HIPAA compliance. Similarly,
they can help demonstrate the integrity of financial data and associated systems and so help attest to SOX
compliance.
Often, spurred by regulation, management encourages IT to manage by numbers, but you need to make sure
the numbers are relevant. Likewise, there are numbers that are easy for IT to generate, but aren’t ultimately
meaningful. It is essential to ensure monitoring mechanisms provide meaningful insights into defined services
and demonstrate whether service commitments are being met.
As manager in charge of SLM for your organization, you have a balance to strike. On one hand, generating little
or no numbers isn’t useful. On the other hand, burying a report recipient in massive volumes of information
doesn’t provide real value either. All too often, I’ve seen SLM initiatives get derailed by all the minutiae that can
be monitored, rather than focusing on the few specific metrics that need to be tracked. Ultimately, you need to
decide, of the plethora of reports and metrics available, what is the best measure, or handful of measures
available, and start with those.
Start simple. Often, organizations find that simplicity ultimately suffices. Many organizations find that taking a
pragmatic approach to SLM not only helps in the near term, but sets the stage for long term improvements.
With these initiatives, being smart about prioritizing and starting small, can pay long term dividends.
For example, one very practical approach to take is to start with components and with the specific operational
requirements of those components, and bring that to the business—bearing in mind these components are part
of a service. This initial dialog can be a useful exercise that helps educate each participant, establish a common
language for discussing services levels, and facilitate an effective groundwork upon which SLAs can ultimately
be built.
The role of organizational change management is another important aspect to consider. For the IT managers
tasked with reporting on SLAs to the business, it is often challenging to get all the IT teams involved in
delivering that service to contribute to this overarching, service-led approach. Ultimately, effective
organizational change management is required to overcome these challenges.
Teams need to understand why this SLM initiative is important, have the knowledge and ability to contribute,
and understand their role in contributing to business success. In the early stages, it is critical to take questions
and concerns seriously, and to respond to team member concerns to ensure there’s ultimately buy in from all
key constituents. Ultimately, the companies that accomplish these objectives most effectively are the ones that
tie this contribution to MBOs, performance reviews, and incentives.
The Monitoring Requirements
All too often, monitoring tools have made administrators choose between two very unappealing choices:
Incurring the huge costs of purchasing and configuring one of the legacy monitoring systems, even though
only a fraction of their capabilities are required.
Selecting a point solution that may meet near term needs, but that won’t scale to meet longer term
requirements.
In most circumstances, the ideal monitoring tool is one that is easy to use and deploy, while at the same time
offering the sophistication and broad infrastructure coverage that enable it to meet long term monitoring
needs. At the end of the day, each technology silo that supports business services needs to be monitored and
reported on. Consequently, the monitoring solution should offer this end-to-end reach and also have
capabilities for reporting on end-user response metrics.
The monitoring tool selected should also provide service-centric report data. Having one tool that provides all
this visibility is essential. If multiple groups are providing metrics from multiple tools, and those metrics are
conflicting, getting a clear picture of service levels is difficult, if not impossible.
Look for sophisticated alert and alarming capabilities that can be triggered from service level status, so, for
example, alerts are generated not only after an SLA is breached, but before, if an SLA in danger of being
breached given existing trends.
Finally, having flexibility is also key in reporting. Look for the flexibility to create customized, ad hoc reports on
demand. For example, if an auditor is on premises, being able to generate reports immediately is vital. On the
other hand, those reports may only be needed once, or very infrequently, so having the control to determine
whether or not reports get generated routinely is also important.
Pitfall #3: Generating SLA Data, but Not Using It
The Scenario
After making significant investments in a service level management infrastructure, a fast growing service
provider was generating effective SLA reports that were providing useful service level insights. The problem?
With over 450 major clients and hundreds of potential reports that can be done for each client, the service
provider’s staff was having a hard time keeping up. With increasing frequency, the data being generated wasn’t
being acted upon—or for that matter even consistently reviewed.
Across each client, no consistent processes were being employed to determine specifically which subset of
reports need to be the focus of ongoing analysis. No automated alarming processes were in place to ensure
that appropriate staff members were immediately apprised of significant issues as they arose. Finally, within
account teams, no established process was put in place to ensure someone was accountable and available for
reviewing the data and ensuring it gets used.
The Solution
Even if an organization has effectively defined the services to be monitored, and ensured that the service level
monitoring mechanisms in place deliver real insights, there’s still a critical third step in place: Establishing and
formalizing processes for reviewing and acting on this data.
A process should be based first on a clear, agreed upon definition of what the objectives of SLA reporting are.
All too often, I’ve seen companies adding to the amount of data being generated without clearly defining the
objectives first. In those cases, before generating more data, administrators need to start with objectives and
decide which reports are required to meet those objectives. Next, management needs to look at the scope of
the project, and then determine what resources are required. Then, and only then, can the IT administrators
begin developing workflow in a realistic, effective manner.
Some people get fooled into thinking a single tool or a workflow is what’s required, but even with tools and
workflows in place, the job is not done. Processes must include tools and workflow, but they must also specify metrics, inputs and outputs, roles and responsibilities, standards, and purpose and scope. The human
resources element of this is vital: Who’s responsible for evaluating this information, and how often? Are they
available to take on this task on a sustained basis? Have they been apprised? Often, a lot of these basic
questions get overlooked, and consequently the final results suffer.
Finally, it is important to remember that process definition is not a one-time deal. IT management is well
served by implementing basic processes initially to help establish consistency, then refining and building on
those processes over time. It’s very important to keep flexible in this regard. In some cases, I’ve found that an
organization will start by developing monthly SLA reports, but over time learn that what the business really
wants is access to real-time dashboards, and that’s perfectly acceptable. Managers should expect processes
and procedures to evolve over time—it’s essential to ensuring they remain effective.
The Monitoring Requirements
Monitoring tools can play a vital role in optimizing the effectiveness of the processes in place. Look for tools
that offer the flexibility to be adapted to existing and evolving processes. The IT team needs to develop
processes that work for them, based on a clear understanding of objectives, scope, and resources. These
processes should not have to be tailored to the rigid methodologies of a complex monitoring solution.
IT processes can’t be done in a vacuum. They need to be developed and revised as maturity, objectives, and
environments evolve. No one process will work for every organization, and its vital that monitoring tools have
the flexibility to be tailored to an array of processes and to be adaptable as those processes change over time.
In addition, alarming capabilities are also vital in ensuring supporting the success of any process. For example,
beyond the massive volumes of reports being generated, are there some kinds of alarms that are going to be
generated to indicate that an SLA has been breached?
Even more importantly are there mechanisms in place so that, before the end of a given reporting period, the
appropriate team members are notified that an SLA is in danger of being breached? For example, if a system
gets hit by a significant outage, most of the associated team will already be aware of a potential SLA breach.
However, what happens if a series of brief outages have occurred that stay under staff’s radar, even though the
possibility of an SLA breach still may be just as strong a possibility? That’s why having automated alerting,
specifically for reporting on potential SLA breaches is so critical. Without configuring and automating these
types of alerts, even the best orchestrated processes may ultimately fail.
Connet ™ refers to Connet Inc. or more of the Connet member firms.