connet
connet.cc relocate to consp.com



Pitfall #1: Monitoring Components vs. Services

The Scenario

Once, I was asked by a large manufacturing company to do an assessment of their ITIL practices, including SLM. Upon my first visit to their offices, throughout various IT departments, these extensive SLA reports were on display, printed large and hanging on their walls.
The network team had reports detailing network performance metrics. The server group had reports on server uptime. This company had invested significant money and resources in these SLA monitoring capabilities, but they didn’t specifically address critical services for the company, for example the service levels of their manufacturing automation system.
Ultimately, business management didn’t feel like end users were experiencing an improvement in service levels, in spite of the huge investments. Further, IT management suffered from a loss in credibility. They came into meetings with business management and delivered reports that demonstrated glowing numbers for performance and uptime. But those numbers didn’t jibe with what end users were actually experiencing.
The Solution
Fundamentally, this scenario illustrates a failure in service definition, and it’s a very common pitfall. From the beginning, you need to involve representatives from the business, or representatives from IT who act as liaisons to business management, to determine what the business priorities are. Based on these priorities, the services that need to be monitored should be defined.
As part of this process, requirements and thresholds need to be negotiated as well. For example, in defining service level reports, one IT organization assumed that the business required a greater degree of uptime than they actually did. The result? They spent a great deal on monitoring and infrastructure enhancements that were not necessary. Similarly, IT needs to be cautious of over committing, especially early on in this process, and setting groups or individuals up for failure.
Engage with business representatives, early and often. Ultimately, by doing so, you can ensure that there’s a common language for evaluating performance, so priorities get addressed, and IT can most effectively demonstrate the value it delivers.
Another important consideration has to do with reporting, and tailoring the delivery of information to the intended audience. At a high level, there are three key audiences for SLA reports: the business, the IT management responsible for reporting to the business, and the IT management responsible for OLAs. Having an understanding of which audience is going to be reviewing a given report is essential to ensuring the effectiveness of that communication.
In the manufacturing company scenario described above, if the IT group had involved business representatives early on, they would have realized that the uptime or performance metrics of any specific system, whether database servers or network components, wasn’t what was important to that audience. What was important were the reliability and responsiveness being delivered by core business services that were counted on by users
every day. On the other hand, having robust monitoring on the performance and uptime of specific elements is critical for those IT managers responsible for OLAs.

The Monitoring Requirements
The capabilities of monitoring tools can play a vital role in service definition. If you run simplistic monitoring, such as pinging, you may see that the machine is up, but that doesn’t necessarily ensure that the machine is doing what the business needs it to. Look for solutions that move beyond simple up/down monitoring to offer more extensive insights into the health and performance of specific infrastructure elements.
Further, look for solutions that offer insights beyond specific elements that underpin a given service to enable service level monitoring. This requires an ability to aggregate silo-centric data to obtain a service-centric perspective—including capabilities for consolidating silo data, processing that silo data in a unified fashion against service-level metrics, and displaying service quality metrics. In addition, it is important that an end-to-end infrastructure view is provided that enables administrators to quickly detect and isolate any events or outages that can affect end user performance.
Finally, gaining visibility into the actual response times end users experience is the ultimate gauge for performance, both for system and service level monitoring. Look for solutions that can do synthetic simulations of transactions in order to provide accurate insights into availability and performance from the end user’s perspective.

Pitfall #2: Generating Volumes of Reports that Don’t Provide Service Insights

The Scenario
In response to regulatory requirements like HIPAA and SOX, a large medical company began generating lots of service level reporting data. In highly regulated companies, management will often demand that massive volumes of reports get generated, with the reasoning being that reporting on every potential variable will help demonstrate compliance and ensure audits are passed. In fact, these regulations generally don’t specify that service level reporting is required at all. In this case, the company’s IT organization delivered reports that addressed service levels, and addressed almost every possible variable, but ultimately too much information was getting handed over to line managers.
In one case, huge reports were being delivered to the DBA manager, but these reports didn’t offer some means of getting executive level summaries, so neither she nor anyone on her team had the time to review those reports. Ultimately, these reports weren’t empowering the DBA manager to get real insights for improving service levels, reducing costs, or streamlining processes.
The Solution
Often regulatory mandates tend to provide high level requirements, leaving a lot of flexibility in terms of interpreting how to apply them to the specifics of your organization. In reality, initiatives for SLM can be instrumental in supporting regulatory requirements, while also being orchestrated to meet the needs of the business.
Effective service level reporting can be instrumental in helping maintain control over regulated health care data and control over the systems that manage that data, so they can help attest to HIPAA compliance. Similarly, they can help demonstrate the integrity of financial data and associated systems and so help attest to SOX compliance.
Often, spurred by regulation, management encourages IT to manage by numbers, but you need to make sure the numbers are relevant. Likewise, there are numbers that are easy for IT to generate, but aren’t ultimately meaningful. It is essential to ensure monitoring mechanisms provide meaningful insights into defined services and demonstrate whether service commitments are being met.
As manager in charge of SLM for your organization, you have a balance to strike. On one hand, generating little or no numbers isn’t useful. On the other hand, burying a report recipient in massive volumes of information doesn’t provide real value either. All too often, I’ve seen SLM initiatives get derailed by all the minutiae that can be monitored, rather than focusing on the few specific metrics that need to be tracked. Ultimately, you need to
decide, of the plethora of reports and metrics available, what is the best measure, or handful of measures available, and start with those.
Start simple. Often, organizations find that simplicity ultimately suffices. Many organizations find that taking a pragmatic approach to SLM not only helps in the near term, but sets the stage for long term improvements. With these initiatives, being smart about prioritizing and starting small, can pay long term dividends.

For example, one very practical approach to take is to start with components and with the specific operational requirements of those components, and bring that to the business—bearing in mind these components are part of a service. This initial dialog can be a useful exercise that helps educate each participant, establish a common language for discussing services levels, and facilitate an effective groundwork upon which SLAs can ultimately be built.
The role of organizational change management is another important aspect to consider. For the IT managers tasked with reporting on SLAs to the business, it is often challenging to get all the IT teams involved in delivering that service to contribute to this overarching, service-led approach. Ultimately, effective organizational change management is required to overcome these challenges.
Teams need to understand why this SLM initiative is important, have the knowledge and ability to contribute, and understand their role in contributing to business success. In the early stages, it is critical to take questions and concerns seriously, and to respond to team member concerns to ensure there’s ultimately buy in from all key constituents. Ultimately, the companies that accomplish these objectives most effectively are the ones that
tie this contribution to MBOs, performance reviews, and incentives.
The Monitoring Requirements
All too often, monitoring tools have made administrators choose between two very unappealing choices:
Incurring the huge costs of purchasing and configuring one of the legacy monitoring systems, even though only a fraction of their capabilities are required.

Selecting a point solution that may meet near term needs, but that won’t scale to meet longer term requirements.
In most circumstances, the ideal monitoring tool is one that is easy to use and deploy, while at the same time offering the sophistication and broad infrastructure coverage that enable it to meet long term monitoring needs. At the end of the day, each technology silo that supports business services needs to be monitored and reported on. Consequently, the monitoring solution should offer this end-to-end reach and also have capabilities for reporting on end-user response metrics.
The monitoring tool selected should also provide service-centric report data. Having one tool that provides all this visibility is essential. If multiple groups are providing metrics from multiple tools, and those metrics are conflicting, getting a clear picture of service levels is difficult, if not impossible.
Look for sophisticated alert and alarming capabilities that can be triggered from service level status, so, for example, alerts are generated not only after an SLA is breached, but before, if an SLA in danger of being breached given existing trends.
Finally, having flexibility is also key in reporting. Look for the flexibility to create customized, ad hoc reports on demand. For example, if an auditor is on premises, being able to generate reports immediately is vital. On the other hand, those reports may only be needed once, or very infrequently, so having the control to determine
whether or not reports get generated routinely is also important.

Pitfall #3: Generating SLA Data, but Not Using It

The Scenario
After making significant investments in a service level management infrastructure, a fast growing service provider was generating effective SLA reports that were providing useful service level insights. The problem?
With over 450 major clients and hundreds of potential reports that can be done for each client, the service provider’s staff was having a hard time keeping up. With increasing frequency, the data being generated wasn’t being acted upon—or for that matter even consistently reviewed.
Across each client, no consistent processes were being employed to determine specifically which subset of reports need to be the focus of ongoing analysis. No automated alarming processes were in place to ensure that appropriate staff members were immediately apprised of significant issues as they arose. Finally, within account teams, no established process was put in place to ensure someone was accountable and available for reviewing the data and ensuring it gets used.
The Solution
Even if an organization has effectively defined the services to be monitored, and ensured that the service level monitoring mechanisms in place deliver real insights, there’s still a critical third step in place: Establishing and formalizing processes for reviewing and acting on this data.
A process should be based first on a clear, agreed upon definition of what the objectives of SLA reporting are. All too often, I’ve seen companies adding to the amount of data being generated without clearly defining the objectives first. In those cases, before generating more data, administrators need to start with objectives and decide which reports are required to meet those objectives. Next, management needs to look at the scope of
the project, and then determine what resources are required. Then, and only then, can the IT administrators begin developing workflow in a realistic, effective manner.
Some people get fooled into thinking a single tool or a workflow is what’s required, but even with tools and workflows in place, the job is not done. Processes must include tools and workflow, but they must also specify metrics, inputs and outputs, roles and responsibilities, standards, and purpose and scope. The human resources element of this is vital: Who’s responsible for evaluating this information, and how often? Are they available to take on this task on a sustained basis? Have they been apprised? Often, a lot of these basic
questions get overlooked, and consequently the final results suffer.
Finally, it is important to remember that process definition is not a one-time deal. IT management is well served by implementing basic processes initially to help establish consistency, then refining and building on those processes over time. It’s very important to keep flexible in this regard. In some cases, I’ve found that an organization will start by developing monthly SLA reports, but over time learn that what the business really wants is access to real-time dashboards, and that’s perfectly acceptable. Managers should expect processes and procedures to evolve over time—it’s essential to ensuring they remain effective.
The Monitoring Requirements
Monitoring tools can play a vital role in optimizing the effectiveness of the processes in place. Look for tools that offer the flexibility to be adapted to existing and evolving processes. The IT team needs to develop processes that work for them, based on a clear understanding of objectives, scope, and resources. These processes should not have to be tailored to the rigid methodologies of a complex monitoring solution.
IT processes can’t be done in a vacuum. They need to be developed and revised as maturity, objectives, and environments evolve. No one process will work for every organization, and its vital that monitoring tools have the flexibility to be tailored to an array of processes and to be adaptable as those processes change over time.
In addition, alarming capabilities are also vital in ensuring supporting the success of any process. For example, beyond the massive volumes of reports being generated, are there some kinds of alarms that are going to be generated to indicate that an SLA has been breached?
Even more importantly are there mechanisms in place so that, before the end of a given reporting period, the appropriate team members are notified that an SLA is in danger of being breached? For example, if a system gets hit by a significant outage, most of the associated team will already be aware of a potential SLA breach.
However, what happens if a series of brief outages have occurred that stay under staff’s radar, even though the possibility of an SLA breach still may be just as strong a possibility? That’s why having automated alerting, specifically for reporting on potential SLA breaches is so critical. Without configuring and automating these types of alerts, even the best orchestrated processes may ultimately fail.



connet Connet ™ refers to Connet Inc. or more of the Connet member firms.  connet RSS connet rss connet site map  connet terms