In ITIL 4, The purpose of the Problem Management Practice / Process is to reduce the likelihood and impact of incidents by identifying actual and potential causes of incidents and managing workarounds and known errors. Problem A cause, or potential cause, of one or more incidents. Known error A problem that has been analyzed but has not been resolved. ITIL Problem Management Best Practices involve these three phases.
ITIL 2011, the most current version, consists of five core publications:
ITIL focuses on ensuring IT Services are delivered successfully and efficiently. Below are the other processes in the IT Service Operation.
What is ITIL Problem Management?
In Problem Management, Process Owner of this process is the Problem Manager. The ITIL Problem Management definition as described below.
The goal of Problem Management is to minimize both the number and severity of incidents and potential problems to the business/organization. Problem Management is the process responsible for managing the lifecycle of all problems. Problem Management is aimed to reduce the adverse impact of incidents and problems that are caused by errors within the IT infrastructure, and to prevent recurrence of incidents related to these errors.
Problems should be addressed in priority order with higher priority given to the resolution of problems that can cause serious disruption to critical IT services. Problem Management’s responsibility is to ensure that incident information is documented in such a way that it is readily available to support all Problem Management activities.
Problem Management has reactive and proactive aspects / Proactive vs. Reactive Problem Management
Reactive – Problem solving when one or more incidents occur. Problem Management, which is generally executed as part of IT Operation
Proactive – Identifying and solving problems and known errors before incidents occur in the first place. Problem Management which is initiated in Operation part of ITIL, but generally driven as part of Continual Service Improvement
ITIL Problem Management Objectives
Problem Management is the process responsible for managing the lifecycle of all problems. The primary objectives of Problem Management are to prevent problems and resulting incidents from happening, to eliminate recurring incidents and to minimize the impact of incidents that cannot be prevented.
What is the Scope of Problem Management?
Problem Management includes the activities required to diagnose the root cause of incidents and to determine the resolution to those problems. It is also responsible for ensuring that the resolution is implemented through the appropriate control procedures, especially Change Management and Release Management.
Problem Management will also maintain information about problems and the appropriate workarounds and resolutions so that the organization is able to reduce the number and impact of incidents over time. In this respect, Problem Management has a strong interface with Knowledge Management, and tools such as the Known Error Database will be used for both.
Although Incident and Problem Management are separate processes, they are closely related and will typically use the same tools, and may use similar categorization, impact, and priority coding systems. This will ensure effective communication when dealing with related incidents and problems.
Difference between Incident Management and Problem Management
The aim of incident management is to restore the service to the user as quickly as possible, often through a workaround, rather than through trying to find a permanent solution which is the aim of problem management. Let’s analyze below the difference Problem Management vs. Incident Management
Incident vs problem
An incident is where an error occurs, or something does not work the way it is expected to. This is often referred to as:
- A fault
- An error
- It does not work!
- A problem
- But the term used is Incident.
A problem can be:
- The occurrence of the same incident many times
- An incident that impacts many users
- The result of network diagnostics revealing systems not operating in the expected way
Therefore, a problem can exist without having an immediate impact on the users.
Incidents are usually more visible and the impact on the user is more immediate.
Examples of problems
Technical problems can exist without impacting the user. However, if they are not spotted and dealt with before an incident occurs they can have a big impact on the availability of IT Services.
Problems experienced by users
- The printer won’t form feed paper through the printer. The user must advance the paper by using the form feed button.
- Each time a new user logs onto a computer, they must reinstall the printer driver.
- Windows applications crash intermittently without an error message. The computer will restart and work properly afterwards.
- Disk space usage is erratic. Sometimes a considerable amount of disk space is available, but at other times little is available. There is no obvious reason and no impact to the users – yet.
- A network card is creating lots of unnecessary traffic on the network, which could eventually reduce the bandwidth available, leading to a slow response from network requests.
What is the ITIL Problem Management Benefits
The benefits of taking a formal approach to problem management include the following:
- Improved quality of the IT service. A high quality, reliable service is good for the business/organization.
- Incident volume reduction. Problem management is instrumental in reducing the number of incidents that interrupt the business/organization every day.
- Permanent solutions. There will be a gradual reduction in the number and impact of problems and known errors as those that are resolved to stay resolved.
- Improved organizational learning. The problem management process is based on the concept of learning from experience. The process provides the historical data to identify trends, and the means of preventing failures and of reducing the impact of failures, resulting in improved productivity.
- A better first-time fix rate at the Service Desk. Problem management enables the Service Desk to know how to deal with problems and incidents that have previously been resolved and documented.
What could affect the benefits of problem management?
The benefits of problem management can be weakened by:
- The absence of a good incident control process, and, therefore, the absence of detailed data on incidents (necessary for the correct identification of problems).
- The failure to link incident records with problem/error records.
- A lack of management or leadership commitment, so that support staff (usually also involved with reactive incident control activities) cannot allocate sufficient time to structural problem-solving activities.
- The role of the Service Desk (all incident reports must come through the Service Desk and difficulties may arise if the Service Desk is dealing with multiple reports of incidents and the technician is not fully aware of the extent of the problem).
- A failure to set aside time to build and update the call log or incident sheets which will restrict the delivery of benefits.
- An inability to determine accurately the impact on the business/organization of incidents and problems; consequently the critical incidents and problems are not given the correct priority.
The Value of Problem Management to the business/organizations
Problem Management works together with Incident Management and Change Management to ensure that IT service availability and quality are increased. When incidents are resolved, information about the resolution is recorded. Over time, this information is used to speed up the resolution time and identify permanent solutions, reducing the number and resolution time of incidents. This results in less downtime and less disruption to business-critical systems.
Additional value is derived from the following:
- Higher availability of IT services
- Higher productivity of a business and IT staff
- Reduced expenditure on workarounds or fixes that do not work
- Reduction in cost of effort in firefighting or resolving repeat incidents.
- Improve the Quality of IT Services
- Increase productivity and reduce the cost
- Improve Business or customer satisfactions
How Problem Management works
ITIL Problem management works by using analysis techniques to identify the cause of the problem. incident management is not usually concerned with the cause, only the cure: restoration of service. Problem management, therefore, takes longer and should be done once the urgency of the incident has been dealt with: for example, removing a faulty computer and replacing it with a working computer, takes the urgency away, and leaves the faulty computer ready for diagnostics.
Problem management can take time. It is important to set time limits or the cost of resolution can become expensive.
To achieve its goal, problem management aims to:
- Identify the root cause – problem control
- Initiate actions to improve and correct the situation – error control
Error control covers the processes involved in successful correction of known errors. The objective is to remove equipment with known errors that affects the IT infrastructure to prevent the recurrence of incidents.
Error control activities can be reactive and proactive.
Reactive activities include:
- Identification of known errors through incident management
- Implementing a workaround
Proactive activities include:
- Finding a solution to a recurring problem
- Creating a solution
- Including the solution in the known errors’ database
Inputs to problem management
Inputs to the problem management process are:
- Incident details from incident management
- Configuration details from the configuration management database
- Details about changes made to the affected equipment
- Any defined workarounds (from incident management)
Outputs from problem management
Outputs from the problem management process are:
- Known errors
- Requests for change (through change management)
- An updated problem record (including a solution and/or any available workarounds)
- Closed problem records for resolved problems
- Knowledgebase content to use in incident management
- Management information through reports
The process activities of Problem Management
The major activities of problem management are:
- Problem control
- Error control
- The proactive prevention of problems
- Identifying trends
- Obtaining management information from Problem Management data
- The completion of a major incident or problem reviews
Roles and Responsibilities in the ITIL Problem Management Process
Process Owner – Problem Manager
The Problem Manager is responsible for managing the lifecycle of all Problems. His primary objectives are to prevent Incidents from happening and to minimize the impact of Incidents that cannot be prevented.
To this purpose he maintains information about Known Errors and Workarounds.
- Service Desk to indicate that the problem has been passed to problem management
- Problem management to log, monitor and track the progress of the problem
- Service Desk or technical staff of the problem manager/process owner to spot trends in incidents
- Problem management to action problems raised from incident management
- Problem management to assist with the handling of major incidents and identifying the root cause
- Technical staff to actively prevent the replication of problems across multiple systems
- Configuration management or change management specialists to be consulted
- Problem management to progress unresolved incidents through the problem management process
- Second line and third line support groups, including specialist support groups and external suppliers, to provide expertise
|ITIL Role | Sub-Process||Problem Manager||Applications Analyst||Technical Analyst|
|Proactive Problem Identification||AR||–||–|
|Problem Categorization and Prioritization||AR||–||–|
|Problem Diagnosis and Resolution||AR||R||R|
|Problem and Error Control||AR||–||–|
|Problem Closure and Evaluation||AR||–||–|
|Major Problem Review||AR||–||–|
|Problem Management Reporting||AR||–||–|
A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Problem Management process.
R: Responsible according to the RACI Model: Those who do the work to achieve a task within Problem Management.
Additional activities that form part of the problem management process
- Developing and maintaining the problem control process
- Reviewing the efficiency and effectiveness of the problem control process
- Producing management information
- Allocating resources for the support effort
- Monitoring the effectiveness of error control and making recommendations for improving it
- Developing and maintaining problem and error control systems
- Reviewing the efficiency and effectiveness of proactive problem management activities
Problem Management Implementation Considerations
- Good problem management relies to a great extent on an implemented and efficient incident management process. So, it is sensible to implement problem management after incident management has been implemented, is considered mature, and has established measures.
- If resources are scarce, it is advisable to concentrate on the implementation of problem and error control (reactive problem management). When these activities reach maturity, resources can be directed to proactive problem management which depends largely on the successful implementation of network monitoring and preventative maintenance.
- Reactive problem management can be introduced by focusing initially on the top ten incidents of the previous week. This can prove to be effective since experience shows that 20% of problems cause 80% of service degradation.
How to Implementing ITIL Problem Management
- An effective system to log both incidents and problems, and their relationship, is fundamental for the success of problem management.
- Setting achievable objectives and making use of the problem-solving talents of existing staff is a key activity. Consider part-time problem management, whereby staff set aside periods when they will look at problems away from the daily fire-fighting pressures.
- In view of the potentially conflicting interests between incident management and problem management, incident management, restoring service, prevails.
Identify who will staff Problem Management
Problem management is a specialized process requiring a good understanding of the IT services being delivered and the tools and technology which support these services. It is expected that technical staff will carry out problem management with input from other specialists where possible, without a dedicated team in place. Specialist input or subscription to a support service may form part of the business/organization contract with a supplier.
Plan the Problem Management Training
The training plan for problem management should concentrate on the Service Desk and all levels of technical staff.
- Ensure the incident management process is understood as this is paramount to the success of the problem management process.
- Train the Service Desk staff on how to progress a call from an incident to a problem.
- Train the Service Desk staff to record incident details in a way that will help technical staff carry out root cause analysis. This will be evident from feedback from the technical staff after the first few problems have been passed through the problem management process.
- Produce process information for the technical staff on the amount of time to allocate resolving a problem to ensure cost control of problem resolution.
- Train the Service Desk staff and technical staff to identify patterns of incidents to indicate a problem.
Activities for the Problem Management Implementation Plan
|Problem Management Implementation Plan|
|Ensure that a mature and measurable incident management process is in place.|
|Decide who will be the problem manager/problem process owner.|
|Decide which staff will be involved as subject matter contacts for problem resolution.|
|Decide which training the technical staff require on the problem management process.|
|Decide which training the service desk staff will require.|
|Arrange and implement the required training.|
|Establish the required analysis techniques and document.|
|Decide how calls will be passed to the problem management process from the Service Desk.|
|Decide which documentation will be used for the problem management process.|
|Decide if a knowledge base will be used.|
|Decide on the format of the knowledge base.|
|Decide how to populate the knowledge base from resolved problems.|
|Ensure the problem management process is documented.|
|Feedback any changes to the process identified from testing of the process.|
|Decide how resolutions will be written up and recorded.|
|Decide who carries out follow up actions and how that will be done.|
|Decide on the activities to review the problem management process.|
|Decide how to keep staff informed on all current problems/problem resolutions, and changes to the problem management process.|
|Decide if you need to run a pilot of the problem management process.|
|Carry out the pilot and pilot review.|
|Feedback changes into the system from the pilot review.|
|Plan the launch date of problem management.|
|Check that all training has occurred, and any required changes have been implemented.|
|Launch the process of problem management.|
Problem Management Reports
- Problem management reports should identify where isolating problems from incidents has provided benefits.
- Show the average time spent on problems per week.
- Show how many problems are deemed not cost-effective to resolve.
- Once implementation is complete, compare the incident levels to the previous weeks to see if problem-solving reduces incidents.
- Show how many problems are solved per week.
- Show the number of identified known errors and their associated workarounds produced from problem management.
- Over time, see if problem management reduces the incident management Top 10.
- Finally, if you implement problem management with incident management, show the number of incidents and problems each week. Over time it will become easier to identify the difference, so persevere with the reports.
ITIL Problem Management Process Flow / Process Diagram
Process Flow or Workflow Diagram and detail description of the process steps are as:
Trend analysis is the key to spotting problems. It is a proactive approach to problem management by which you can avoid the occurrence of the problem earlier.
Problem logging is critical as all the necessary information from the incidents has to be captured while creating the problem. Create a problem from the Incidents, maintaining the link to the incident(s). Avoid duplicates by searching for similar existing problems before the creation of a new problem.
Problem categorization is essential to avoid ambiguities. The categorization makes it simpler to search incidents and associated problem records.
Problem prioritization helps technical staff to identify critical problems that need to be addressed. impact and urgency associated with a problem decides which problems need to be addressed first. When a problem is created from an incident, the impact, urgency, and priority values get assigned from it automatically and reduce the task of prioritizing the problem for technical staff.
Investigation and Diagnosis
Problem investigation results in getting to the root cause of the problem and initiating actions to resume the failed service. Analyze the impact, root cause and symptoms of the problem to provide a problem resolution.
The successful diagnosis of a root cause results in changing the problem to a known error and suggests a workaround. The Service Desk browsing through these known error records/workarounds helps in resolving incidents and in lowering the incident resolution time.
Once a workaround, permanent or temporary, has been established, either by logging a temporary workaround/ known error in the known error database or a permanent solution via a request for change, a problem can be closed.
In the case of, a permanent solution to a problem to the problem record is not usually closed until the request for change has been implemented. If the request for change is refused the problem record is updated with the reasons why and the problem record is closed.
Problem Management Consideration in ITIL 4 Service Value Chain
The Problem Management process follows the specifications from the ITIL V3. Problem Management is a process in the service lifecycle stage of the Operation of IT Services (ITSM). But ITIL v4 is no longer prescriptive now about processes but changed the focus to 34 ITIL 4 Practices, having more freedom to define tailor-made processes for the IT Leaders.
ITIL 4 refers to Problem Management as a Service Management Practice, describing the key activities, inputs, outputs, and roles. Following this guidance, organizations are advised to design a process for managing Problems in line with their specific requirements and problems.
Since the processes defined in ITIL v3 have not been invalidated with the newly introduced ITIL V4, you can still use the ITIL v3 process of Problem Management as a template.
Problem management is usually focused on errors in operational environments. Contribution of problem management to the service value chain, with the practice being applied mainly to the improve, and deliver and support value chain activities:
Improve: This is the focus area for problem management. Effective Problem Management provides you the understanding needed to reduce the number of incidents and the impact of incidents that cannot be prevented.
Engage: Problems that have a significant impact on services will be visible to customers and users. In some cases, customers may wish to be involved in problem prioritization, and the status and plans for managing problems should be communicated. Workarounds are often presented to users via a service portal.
Design and transition: Problem management provides information that helps to improve testing and knowledge transfer.
Obtain/build: Product defects may be identified by problem management; these are then managed as part of this value chain activity.
Deliver and support: Problem management makes a significant contribution by preventing incident repetition and supporting timely incident resolution.
ITIL Problem Management Metrics / Key Process Indicators (KPIs)
The following metrics should be used to judge the effectiveness and efficiency of the Problem Management process, or its operation:
- The total number of problems recorded in the period (as a control measure)
- The percentage of problems resolved within SLA targets (and the percentage that are not!)
- The number and percentage of problems that exceeded their target resolution times
- The backlog of outstanding problems and the trend (static, reducing, or increasing?)
- The average cost of handling a problem
- The Root Cause Analysis (RCA) Reports
- The number of major problems (opened and closed and backlog)
- The percentage of Major Problem Reviews successfully performed
- The number of Known Errors added to the KEDB (Known Error Database)
- The percentage accuracy of the KEDB (from audits of the database)
- The percentage of Major Problem Reviews completed successfully and on time.
All metrics should be broken down by category, impact, severity, urgency, and priority level and compared with previous periods.
What are the Challenges, Critical Success Factors and Risks of Problem Management?
A major dependency on Problem Management is the establishment of an effective Incident Management process and tools. This will ensure that problems are identified as soon as possible and that as much work is done on pre-qualification as possible. However, it is also critical that the two processes have formal interfaces and common working practices.
This implies the following:
- Linking Incident and Problem Management tools
- The ability to relate Incident and Problem Records
- The second- and third-line staff should have a good working relationship with staff on the first line
- Making sure that business impact is well understood by all staff working on problem resolution.
In addition, it is important that the Problem Management can use all Knowledge and Configuration Management resources available.
Another Critical Success Factor (CSF) is the ongoing training of technical staff in both technical aspects of their job as well as the business implications of the services they support and the processes they use.
ITIL Problem Management supports the ITIL process and practice to find and fix the root cause of issues and problems that result in incidents. You can record the problems, associate with incidents, and assign them to appropriate groups. You can create the knowledge from problems, request changes, escalate, and manage problems to its resolution and reporting.
Problem Management is essential to process in ITIL to improve productivity and increase the performance of the Business to avoid interruption with multiple repeatable Incidents. Effective implementation and awareness of this process is the key to drive the IT Business Relations strategically. If you say that what is the ITIL Problem Management Tools, I would prefer to have Service-Now and Jira. You can try with BMC as well. These are the very good cloud tools following the ITIL best practices and framework.
ITIL® is a registered trademark of AXELOS Limited. IT Infrastructure Library® is a registered trademark of AXELOS Limited.