All about Service-Level Agreement (SLA) Management

 What: A SLA defines the level of service you anticipate from a vendor, exposing the metrics by which service is measured.

As well as fixes or corrections should agree-on service levels not be achieved.

Why: It pulls out gathered information on all of the contracted services and their agreed-upon expected reliability into a single document.

They clearly state metrics, responsibilities, and expectations so that neither party can claim ignorance in the event of issues with the service. It ensures both sides have the same understanding of requirements.

How:

1.     Maintain documents on Regular bases

·        Customers want to understand frequent changes into the project, and for that, an SLA document should be reviewed and updated regularly to ensure customer’s expectations.

2.      SLA should be top of everything

·        As mentioned in the above lines SLA plays a very important key factor in any project, it should be done regularly.

·        We do perform QA on daily basis and this makes it easy for a person who is doing QA to review the SLA for tickets that are under review, MTTI or important for management.

·        This kind of tickets should be highlighted and should be mentioned into different sheet with all the details, Review, Labeling, RCA and SOP in place.  

Mean Time to Identify (MTTI)

What: MTTI is the term to identify any problem in the current project & Time taken at the first place.

Why: This help team to identify the issue and resolve it faster each time the issue repeats.

How: Very first and foremost thing for MTTI is, the team should be aware of the environment, Old/New Deployments, Services, Architecture/Infrastructure and the Application flow to understand the blocker and identify it quickly where the problem arises.

There should be Flow Charts, Activity Diagram & User Case diagram to understand the environment and detect the issue fast and accurate.

Below are some mentioned details to be maintained:

Issue:

Environment:

Reported By:

Ticket:

Issue Start Time:

Reported Time:

First Acknowledge Time:

Fixed Time:

RCA:

Fixed By:

Monitoring Link:

Violation Duration:

Action Taken:

Mean Time to Resolve (MTTR)

What: Is the average time between the start and resolution of an incident.

Why: As DevOps teams make releases more often and automate more, due to this reason performance and availability problems have increased.

The result is Ops is spending more time troubleshooting and development is drawn into production troubleshooting.

Reducing MTTR and MTTI is more important than ever.

How: As mentioned in MTTI to quickly get to any resolution, the very first thing is to identify the problem.

Once the problem is identified we can channel it to the correct team for the resolution and revisit with RCA once completed.

The team need to be proactive in identifying unexpected conditions and undesired behaviours when any deployments are going in place.

As of now, we are maintaining a manual sheet to enter MTTR data, where there are very high chances for human errors, Resource bandwidth utilization and moreover, a supervisor to keep checking if all the data are filed correctly.

Data like MTTR & MTTI should be directly fetched out from the automation where there are very low chances for errors and we can easily track down gaps in team & use resources to channel & use their potential on reducing MTTR/MTTI.

Deviation in project Schedule

What: Any change in baseline plan of a project and the actual plan achievements to obtain the goal.

Why:

How: To begin with, the project manager identifies the deviation in the basic performance, then establishes the causes of the deviations and assesses the severity of the impact.

Inform the Project Manager regarding any change and any change should be channelled through Management.

Analysis of Change:

·       Identify the relevant key performance indicators.

·       Evaluate the scope of the deviation.

·       Measure the degree of impact on project performance.

·       Identify the causes of the deviation change.

·       Establish corrective action.

·       Estimate the resources need.

·       Establish a time in deviation change.

·       Recommend preventive action.

KPI (Internal)

What: The quantifiable measure of performance over time for a specific objective. Demonstrates how effectively a company is achieving key business objectives.

Why: KPIs are more than numbers you report out weekly. They enable you to understand the performance and health of your business so that you can make critical adjustments in your execution to achieve your strategic goals.

How: Analysis of below Questions for find out KPI in Team:

·       What is your desired outcome? (Defines from Scope of work)

·       Who is responsible for the business outcome? (Check with Project Manager)

·       How are you going to measure progress? (MTTI / MTTR)

·       How often will you review progress towards the outcome? (Daily QA)

·       How will you know you’ve achieved your outcome? (Closure on Ticket/Project with RCA & Required information)

Create SMART KPI:

·       Is your objective Specific? (Yes/No)

·       Can you Measure progress towards that goal? (MTTI / MTTR & QA)

·       Is the goal realistically Attainable? (Yes/No)

·       How Relevant is the goal of your organization? (Discuss with Project Management)

·       What is the Time-frame for achieving this goal? (Every project should have ETA & New Implementation should be defined as new Scope of Work)

Artefacts / Record

What: We have different types of assets for our Knowledge on Arlo like SOP, Knowledge Base, Inventory, and Confluence to maintain the Artifacts / Record.

Why: This is very an important part of SLA as a team should be on the same page and follow the same steps to achieve perfection and no gaps for any kind of disturbance.

How: There are 2 Categories as mentioned below:

1.     Pre-define

·       Some repetitive tasks which are done on daily bases should have all required documents in place.

·       The team should follow the standard decided procedures to commit the goal.

·       Every single member of the team should be on the same page.

2.     Post-define

·       There are many new incoming Projects which required POC.

·       Such projects should be always channelled through Management to define the Scope of Work.

·       Once the Project is underlaying our scope, should do some POC around taking DR in concern and Backups.

·       Once we have a confident POC, we need to create an SOP, share it with the stockholder, and verify it.

·       Once approved upload to Confluence.


Thanks & Regards,
Tapan Patni
Email: tapanpatni58@gmail.com

Comments

Post a Comment

People also Look For

All about DevOps (A Complete Guide to DevOps)

How to Implement Microservice Coded In Hackathon Event

Upcoming DevOps trends

All about Cloud Computing