ITIL Incident Management Process is the part of IT Service Operations in IT Service Management (ITSM) and Incident Manager is the Process Owner of this Process. Businesses always faced different types of issues and problems related to Technology and Incident Management process help and support to the business to quickly resolve the issues and restore the service for smooth Business Operations to accomplish greater proficiency and productivity with a speedy recovery of IT Services. In this guide, we will discuss the ITIL Incident Management Process how this process can help and support to the enterprises.
What is ITIL Incident Management?
In ITIL 3, an ‘incident’ is defined as an unplanned interruption to an IT Service or reduction in the quality of an IT Service. Failure of a configuration item (CI) that has not yet impacted service is also an incident, for example, failure of one disk from a mirror set.
Incident Management is the process for dealing with all incidents; this can include failures, questions, or queries reported by the users (usually via a telephone call to the Service Desk), by technical staff, or automatically detected and reported by event monitoring tools.
What is the Purpose of ITIL Incident Management?
ITIL 4 describes the purpose of the Incident Management Practice is to minimize the negative impact of incidents by restoring normal service operation as quickly as possible. ITIL 4 definition of Incident is an unplanned interruption to a service or reduction in the quality of service.
The process of dealing with all the incidents throughout the service lifecycle is known as Incident Management. The primary goal of the Incident Management process is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. ‘Normal service operation’ is defined here as service operation within Service Level Agreement (SLA) limits.
Incident Management finds out a workaround or temporary fixes, rather than through trying to find a permanent solution. Incident management is a defined process for logging, recording, and resolving incidents.
Examples of Incidents
User Experienced Incidents
- Service not available (this could be due to either the network or the application, but at first the user will not be able to determine which)
- Error message when trying to access the application
- Application bug or query preventing the user from working
- Disk space full
- Technical Incident
- System down
- Printer not printing
- New hardware, such as scanner, printer, or digital camera, not working
- Technical Incident
What are the Technical Incidents
Technical incidents can occur without the user being aware of them. There may be a slower response on the network or on individual workstations but, if this is a gradual decline, the user may not notice.
Technicians using diagnostics or proactive monitoring usually spot technical incidents. If a technical incident is not resolved, the impact can affect many users for a long time. In time, Experienced users and the Service Desk will spot these Incidents before the impact affects most users.
Examples of Technical Incidents:
- Disk space nearly full (this will affect users only when it is completely full)
- Network card intermittent fault – sometimes it appears that the user cannot connect to the network, but on a second attempt the connection works. Replacing the card before it stops working completely provides more benefit to the users
- Monitor flickering – it is more troublesome in some applications than others
- Although the flicker may be easy to live with or ignore, the monitor will not usually last more than a few weeks in this state
Incident Management Basic Concepts
- Depend on the priority defined.
- Should be documented in Operational Level Agreements and Underpinning Contracts
- Incident Models
- Based on predefined steps to handle a particular incident
- A detailed description of the steps
- Major Incident
- A break in service which threatens to cause or may cause a loss to the business
- A separate procedure including shorter timescales and greater urgency is used
What is the ITIL Major Incident Management Process?
A separate procedure, with shorter timescales and greater urgency, must be used for ‘major’ incidents. A definition of what constitutes a major incident must be agreed and ideally mapped on to the overall incident prioritization system – such that they will be dealt with through the major incident process. You can define the rule in which situation, you should follow the Major Incident Process aligned with Business.
- Handling Major Incidents within IT and Business.
- Starting trigger for this major incident process is setting the priority of an IT-incident ticket to “CRITICAL”
- This is allowed to key users only defined by Management and IT
- Root Cause Analysis reports of all MINCs are reviewed and approved by the CIO / IT Management Team
- Build Major Incident Process Teams
- IT Service Desk like Service-Now: Incident Management Tool
- MINC Resolution Team (MRT) – Technical & Functional Teams to resolved MINC
- MINC Manager (On-Call) – Lead the MINC: Perform Root Cause Analysis (RCA)
- MINC Management Team – Accept/Reject the MINC: Re-Prioritization
- Management IT (MIT) – RCA Accepted and Continuous Improvements
What is the Objective of the Incident Management Process?
Following are the ITIL Incident Management Objectives of Incident Management process:
- To ensure standardization of methods and procedures used for an efficient and prompt response.
- To analyze, document, and report incidents during the management process.
- Increase visibility and communication of incidents to business and IT support staff
- To align incident management activities and priorities with those of the business.
Scope of ITIL Incident Management
Managing any disruption or potential disruption to live IT services is the primary scope of incident management. It also comprises events identified:
- Directly by users through Service Desk.
- Through an interface from Event Management to Incident Management tools; and
- Reported or logged by Technical Staff.
Incident Management Value to Business
The value of Incident Management includes:
- The ability to detect and resolve incidents, which results in lower downtime to the business, which in turn means higher availability of the service. This means that the business is able to exploit the functionality of the service as designed.
- The ability to align IT activity to real-time business priorities. This is because Incident Management includes the capability to identify business priorities and dynamically allocate resources as necessary.
- The ability to identify potential improvements to services. This happens as a result of understanding what constitutes an incident and also from being in contact with the activities of business operational staff.
- The Service Desk can, during its handling of incidents, identify additional service or training requirements found in IT or the business. Incident Management is highly visible to the business, and it is, therefore, easier to demonstrate its value than most areas in Service Operation. For this reason, Incident Management is often one of the first processes to be implemented in Service Management projects. The added benefit of doing this is that Incident Management can be used to highlight other areas that need attention – thereby providing a justification for expenditure on implementing other processes.
Differences between Incident Management and Problem Management?
Problem Management differs from Incident Management in that its main goal is the detection of the underlying causes of an incident and the best resolution and prevention. In many situations, the goals of problem management can be in direct conflict with the goals of incident management.
Deciding which approach to take requires careful consideration. A sensible approach would be to restore the service as quickly as possible (Incident Management) but ensuring that all details are recorded. This will enable problem management to continue once a workaround has been implemented.
Discipline is required, as the idea that the incident is fixed is likely to prevail. However, the incident may well appear again if the resolution to the problem is not found.
Incident versus Problem
An incident is where an error occurs: something does not work the way it is expected.
This is often described as:
- A fault
- An error
- It does not work!
- A problem
But the ITIL term used with is an Incident.
A problem (is different) and can be:
- The occurrence of the same incident many times
- An incident that affects many users
- The result of network diagnostics revealing that some systems are not operating
- In the expected way
A problem can exist without having an immediate impact on the users, whereas incidents are usually more visible and the impact on the user is more immediate.
Why use Incident Management? / Benefits of ITIL Incident Management
There are major benefits to be gained by implementing an Incident Management Process:
- Improved information to customers/users on aspects of service quality
- Improved information on the reliability of equipment
- Better staff confidence that a process exists to keep IT services working
- The certainty that incidents logged will be addressed and not forgotten
- Reduction of the impact of incidents on the business/organization
- Resolving the Incident first rather than the problem, which will help in keeping the service available (but beware of too many quick fixes that problem management does not ultimately resolve)
- Working with knowledge about the configuration and any changes made, which will enable you to identify the cause of incidents quickly
- Improved monitoring and ability to interpret the reports, which will help to identify Incidents before they have an impact
What happens if Incident Management is not used?
Failing to Implement Incident Management may result in:
- No one managing and escalating incidents
- Unnecessary severity of incidents and increased likelihood of impact on other areas (for instance, a full disk will prevent printing, saving work and copying files)
- Technicians being asked to do routine tasks such as clear paper jams, repair a broken monitor that has merely had the power disconnected, or fix a disk error when a floppy disk was left in during reboot
- Specialist support staff being subject to constant interruption, making them less effective
- Your other staff being disrupted as people ask their colleagues for advice
- Frequent reassessment of incidents from first principles rather than referring to existing solutions, such as the knowledge database
- Lack of coordinated management information
- Forgotten, incorrectly handled, or badly managed incidents
Issues with deciding on an Incident Management Process
Prepare to overcome
- Absence of visible management or staff commitment, resulting in non-availability of resources for implementation
- Lack of clarity about the business/organization’s needs
- Out of date working practices
- Poorly defined objectives, goals, and responsibilities
- Absence of knowledge for resolving incidents
- Inadequate staff training
- Resistance to change
Who uses Incident Management?
Any organization that needs to understand its technical support requirements should start with implementing a service desk, closely followed by a defined Incident Management Process.
It will help to channel all incidents through a single point of contact (service desk) so that someone is responsible for following them through to a speedy resolution. Most organizations that rely on IT services need to know how their ICT systems/IT services are functioning, what is failing, and how long systems are unavailable.
The reports produced in the process of incident management focus on the performance of equipment, and not on the technical issues that created the incidents.
How Incident Management works / Incident Management Process
Incident management is about understanding the incident life cycle and the actions to take at each stage.
Inputs to the incident process
- Incident details logged at the service desk
- Configuration details from the configuration management database
- Output from problem management and known errors
- Resolution details from other incidents
- Responses to requests for change
Output from the incident process
- Incident resolution and closure
- Updated incident record and call log
- Methods for work arounds
- Communication with the user
- Requests for change
- Management information (reports)
- Input to the problem management process
Activities of the Incident process
- Incident detection and recording
- Initial user support by the single point of contact (service desk)
- Investigation and diagnosis
- Resolution and recovery of service
- Incident closure
- Incident ownership, monitoring, and communication
Incident Management Implementation Requirements
- Before identifying your needs, consider what you want to achieve
- This is an opportunity to re-evaluate the way you have, to date, approached, and fixed incidents. Rethink current processes and activities
- Understand the difference between incident management and problem management
- Technical staff will always try to solve the cause of a problem. Their way of thinking needs to change so that they approach it with incident management before problem management
- Choose which areas to improve and which processes to remove
- You need to sell the idea to the other staff, so make it appeal to yourself first
Incident Management Process Flow & Life Cycle Diagram
ITIL Incident Management Roles and Functions in the Incident Process
Service desk role in Incident Management
Service desk responsibilities include:
- Logging the incident in the call log
- Performing the initial Incident diagnostics
- Requesting technician support when required
- Owning, monitoring, and communicating
- Updating records (call log, incident sheet) with the resolution
- Closing incidents
- Progressing any follow up action (for example, following through into problem management)
Technical support role in Incident Management
The technician’s role in incident management has the same focus – to restore the service as soon as possible. The technician will keep the service desk informed at all stages.
Other Incident Management Roles
Additional first line support groups, such as configuration management or change management specialists should be consulted.
Second- and third-line support groups, including specialist support groups and external suppliers should be consulted as necessary.
Users should keep the service desk informed of any further changes to the state of the affected equipment (sometimes computers start working again when different incidents are resolved).
Prepare to Implement Incident Management
- Implement the service desk first
- Decide how incident management will interface with the service desk
- Decide who will take on the responsibility of incident management
- Make sure that management commitment, budget and resource is made available before you consider setting up incident management
- Ensure that the proposed solution aligns with your business/organization’s strategy and vision
- Define clear objectives and deliverables
- Involve and consult IT staff
- Sell the benefits to the support staff – implementing incident management will need a change of behaviour from IT staff as well as users
- Plan the incident management process training
- Service desk training is the priority
- Incident management training – who, when
- Decide what to measure and report
- Before making changes, you must understand the levels of service you are currently providing with the current resources available
- Produce a report on the number of calls currently logged, the time taken to resolve them and the time the equipment is unavailable – this is your baseline
- Set targets for a manageable number of objectives for the effectiveness of incident management
- Decide what incident management reports are required
- Ensure that the incident management process is regularly reviewed
Incident Management Post Implementation Review
It is the users’ perception rather than availability statistics or transaction rates that, in the end, defines whether the service is meeting their needs.
User satisfaction Analysis and Surveys
Satisfaction surveys are an excellent method of monitoring user perception and expectation and can be used as a powerful marketing tool. However, to ensure success you should address several key points:
- Decide on the scope of the survey
- Decide on the target audience
- Clearly define the questions
- Make the survey easy to complete
- Conduct the survey regularly
- Make sure that your users understand the benefits
- Publish the results
- Follow through on survey results
- Translate survey results into actions
How to manage Incident Management Statuses?
Incident statuses mirror the Incident Process and as follows:
- In Progress
- On hold or pending or awaiting user info
The New status indicates that the service desk has received the incident but has not assigned it to a group or technician. Response Time is important Matrix to be used for this tatus.
The Assigned status means that an Incident has been assigned to an individual Technician or Group on service desk.
The In-Progress status indicates that an incident has been assigned to a group or technician but has not been resolved. The technician is actively working with the user to diagnose and resolve the incident.
The On-hold status indicates that the incident requires some information or response from the user or from a third party or awaiting info. The Incident is placed “on hold” so that SLA response deadlines are not exceeded while waiting for a response from the user or vendor or others.
The Resolved status means that the service desk has confirmed that the incident is resolved, and that the user’s service has restored to the SLA levels. On this stage, users can re-open the incident if issue not resolved within defined timeline before closing the incident.
The Closed status indicates that the incident is resolved and that no further actions can be taken.
Incident management follows incidents through the service desk to track trends in incident categories and time in each status. The final component of incident management is the evaluation of the data gathered. Incident data guides organizations to make decisions that improve the quality of service delivered and decrease the overall volume of incidents reported. Incident management is just one process in the service operation framework.
Measurements of Incident Management
- Do not set targets that cannot be measured
- Ensure that users are aware of what you are doing, and why
- Establish a baseline before discussing formal Service Level Agreements (SLAs) with customers
- Maintain measurements of what is necessary and viable. For instance, if your staff think that they need feedback on response times, then measure them
ITIL 4 Incident Management Contribution to the Service Value Chain (SVC)
Contribution of incident management to the service value chain, with the practice being applied mainly to the engage, and deliver and support value chain activities. Except for plan, other activities may use information about incidents to help set priorities:
Improve: Incident records are a key input to improvement activities and are prioritized both in terms of incident frequency and severity.
Engage: Incidents are visible to users, and significant incidents are also visible to customers. Good incident management requires regular communication to understand the issues, set expectations, provide status updates, and agree that the issue has been resolved so the incident can be closed.
Design and transition: Incidents may occur in test environments, as well as during service release and deployment. The practice ensures these incidents are resolved in a timely and controlled manner.
Obtain/build: Incidents may occur in development environments. Incident management practice ensures these incidents are resolved in a timely and controlled manner.
Deliver and support: Incident management makes a significant contribution to support. This value chain activity includes resolving incidents and problems.
Incident Management Interfaces with Other ITIL Processes
The interfaces with Incident Management include:
- Problem Management: Incident Management forms part of the overall process of dealing with problems in the organization. Incidents are often caused by underlying problems, which must be solved to prevent the incident from recurring. Incident Management provides a point where these are reported.
- Configuration Management provides the data used to identify and progress incidents. One of the uses of the CMS is to identify faulty equipment and to assess the impact of an incident. It is also used to identify the users affected by potential problems. The CMS also contains information about which categories of incident should be assigned to which support group. In turn, Incident Management can maintain the status of faulty CIs. It can also assist Configuration Management to audit the infrastructure when working to resolve an incident.
- Change Management: Where a change is required to implement a workaround or resolution, this will need to be logged as an RFC and progressed through Change Management. In turn, Incident Management is able to detect and resolve incidents that arise from failed changes.
- Capacity Management: Incident Management provides a trigger for performance monitoring where there appears to be a performance problem. Capacity Management may develop workarounds for incidents.
- Availability Management: will use Incident Management data to determine the availability of IT services and look at where the incident lifecycle can be improved.
- SLM-Service Level Management: The ability to resolve incidents at a specified time is a key part of delivering an agreed level of service. Incident Management enables SLM to define measurable responses to service disruptions. It also provides reports that enable SLM to review SLAs objectively and regularly. In particular, Incident Management is able to assist in defining where services are at their weakest so that SLM can define actions as part of the Service Improvement Plan (SIP) – please see the Continual Service Improvement publication for more details. SLM defines the acceptable levels of service within which Incident Management works, including:
- Incident response times
- Impact definitions
- Target fix times
- Service definitions, which are mapped to users
- Rules for requesting services
- Expectations for providing feedback to users.
ITIL Incident Management Metrics / KPIs
The metrics that should be monitored and reported upon to judge the efficiency and effectiveness of the Incident Management process, and its operation, will include:
- Total numbers of Incidents (as a control measure)
- Breakdown of incidents at each stage (e.g. logged, work in progress, closed etc)
- Size of current incident backlog
- Number and percentage of major incidents
- Mean elapsed time to achieve incident resolution or circumvention, broken down by impact code
- Percentage of incidents handled within agreed response time (incident response-time targets may be specified in SLAs, for example, by impact and urgency codes)
- The average cost per incident
- Number of Incidents reopened and as a percentage of the total
- Percentage and Number of incidents incorrectly assigned
- Number and percentage of incidents incorrectly categorized
- Percentage of Incidents closed by the Service Desk without reference to other levels of support (often referred to as ‘first point of contact’)
- Percentage and Number the of incidents processed per Service Desk agent
- Number and percentage of incidents resolved remotely, without the need for a visit
- Number of incidents handled by each Incident Model
- Breakdown of incidents by time of day, to help pinpoint peaks and ensure matching of resources.
Incident Management Reports
There should already be reports produced by the service desk on the number of incidents logged each week. Expand on the information in those reports to decide whether your new approach to incident management is effective. For example:
- In addition to recording the number of incidents logged each week, compare the numbers to incidents logged prior to implementing incident management
- Show the average length of time taken to resolve incidents before and after implementing incident management
- Where possible, show the types of incident reported
- Percentage of incidents handled within the agreed response time
- Show the percentage of incidents closed by the service desk without the need for contacting technical support
- Show the number and percentage of incidents resolved remotely, without the need for a visit
For you, Reports are used to summarize in non-technical language and to show where improvements could be made. Often the improvements require expenditure, so having reports to back up your suggestions can prove invaluable.
Reports should be produced under the authority of the Incident Manager, who should draw up a schedule and distribution list, in collaboration with the Service Desk and support groups handling incidents. Distribution lists should at least include IT Services Management and specialist support groups. Consider also making the data available to users and customers, for example via SLA reports.
ITIL Incident Management Challenges, Critical Success Factors, and Risks
Challenges of Incident management
The following challenges will exist for successful Incident Management:
- The ability to detect incidents as early as possible. This will require education of the users reporting incidents, the use of Super Users, and the configuration of Event Management tools.
- Difficult for you convincing all staff (technical teams as well as users) that all incidents must be logged and encouraging the use of self-help web-based capabilities (which can speed up assistance and reduce resource requirements).
- Availability of information about problems and Known Errors. This will enable Incident Management staff to learn from previous incidents and also to track the status of resolutions.
- Integration into the CMS to determine relationships between CIs and to refer to the history of CIs when performing first-line support.
- Integration into the SLM process. This will assist Incident Management correctly to assess the impact and priority of incidents and assists in defining and executing escalation procedures. SLM will also benefit from the information learned during the Incident. Management, for example in determining whether service level performance targets are realistic and achievable.
Critical Success Factors (CSF) of Incident Management
The following factors will be critical for successful Incident Management:
- A good Service Desk is key to successful Incident Management
- Clearly defined targets to work to – as defined in SLAs
- Adequate customer-oriented and technically training support staff with the correct skill levels, at all stages of the process
- Integrated support tools to drive and control the process
- OLAs and UCs that are capable of influencing and shaping the correct behavior of all support staff.
Risks of Incident Management
The risks to successful Incident Management are actually similar to some of the challenges and the reverse of some of the Critical Success Factors mentioned above. They include:
- Being inundated with incidents that cannot be handled within acceptable timescales due to a lack of available or properly trained resources
- Incidents being bogged down and not progressed as intended because of inadequate support tools to raise alerts and prompt progress
- Lack of adequate and/or timely information sources because of inadequate tools or lack of integration
- Mismatches in objectives or actions because of poorly aligned or non-existent OLAs and/or UCs.
ITIL Incident Management plays a major role in IT Service Operations to deliver and support IT Services to Business. This process help to Business to resolve their on going Issues and problems to increase the Productivity and Efficiency. IT Service Management (ITSM) support and delivery models of IT Services depend the high quality Incident Management Process.
At the end, I would like to you ask you, what you would like to share and comments from your experience to improve this process to be more values addition to the Businesses / Enterprises?