All about Service-Level Agreement (SLA) Management
What: A SLA defines the level of service you anticipate from a vendor, exposing the metrics by which service is measured.
As well as fixes or corrections should agree-on service levels not be achieved.
Why: It pulls out gathered
information on all of the contracted services and their agreed-upon expected
reliability into a single document.
They clearly state metrics, responsibilities, and expectations so that neither party can claim ignorance in the event of issues with the service. It
ensures both sides have the same understanding of requirements.
How:
1.
Maintain documents on Regular bases
·
Customers want to understand frequent changes into the project, and for that, an SLA document should be reviewed and updated regularly
to ensure customer’s expectations.
2.
SLA should
be top of everything
·
As
mentioned in the above lines SLA plays a very important key factor in any project,
it should be done regularly.
·
We do
perform QA on daily basis and this makes it easy for a person who is doing QA to
review the SLA for tickets that are under review, MTTI or important for
management.
·
This kind
of tickets should be highlighted and should be mentioned into different sheet
with all the details, Review, Labeling, RCA and SOP in place.
Mean
Time to Identify (MTTI)
What: MTTI is the
term to identify any problem in the current project & Time taken at the first
place.
Why: This help
team to identify the issue and resolve it faster each time the issue repeats.
How: Very first
and foremost thing for MTTI is, the team should be aware of the environment, Old/New
Deployments, Services, Architecture/Infrastructure and the Application flow to
understand the blocker and identify it quickly where the problem arises.
There should be Flow Charts, Activity Diagram & User Case diagram to
understand the environment and detect the issue fast and accurate.
Below are some mentioned details to be maintained:
Issue:
Environment:
Reported By:
Ticket:
Issue Start Time:
Reported Time:
First Acknowledge Time:
Fixed Time:
RCA:
Fixed By:
Monitoring Link:
Violation Duration:
Action Taken:
Mean
Time to Resolve (MTTR)
What: Is the average time between the start and resolution of an incident.
Why: As DevOps
teams make releases more often and automate more, due to this reason performance
and availability problems have increased.
The result is Ops is spending more time troubleshooting and development
is drawn into production troubleshooting.
Reducing MTTR and MTTI is more important than ever.
How: As mentioned
in MTTI to quickly get to any resolution, the very first thing is to identify the
problem.
Once the problem is identified we can channel it to the correct team for the
resolution and revisit with RCA once completed.
The team need to be proactive in identifying unexpected conditions and
undesired behaviours when any deployments are going in place.
As of now, we are maintaining a manual sheet to enter MTTR data, where
there are very high chances for human errors, Resource bandwidth utilization and
moreover, a supervisor to keep checking if all the data are filed correctly.
Data like MTTR & MTTI should be directly fetched out from the
automation where there are very low chances for errors and we can easily track
down gaps in team & use resources to channel & use their potential on
reducing MTTR/MTTI.
Deviation
in project Schedule
What: Any change in
baseline plan of a project and the actual plan achievements to obtain the goal.
Why:
How: To begin
with, the project manager identifies the deviation in the basic performance,
then establishes the causes of the deviations and assesses the severity of the
impact.
Inform the Project Manager regarding any change and any change should
be channelled through Management.
Analysis of Change:
·
Identify the relevant key performance indicators.
·
Evaluate the scope of the deviation.
·
Measure the degree of impact on project performance.
·
Identify the causes of the deviation change.
·
Establish corrective action.
·
Estimate the resources need.
·
Establish a time in deviation change.
·
Recommend preventive action.
KPI
(Internal)
What: The quantifiable measure of performance over time for a specific objective. Demonstrates
how effectively a company is achieving key business objectives.
Why: KPIs are more
than numbers you report out weekly. They enable you to understand the
performance and health of your business so that you can make critical
adjustments in your execution to achieve your strategic goals.
How: Analysis of
below Questions for find out KPI in Team:
·
What is your desired outcome? (Defines from Scope of
work)
·
Who is responsible for the business outcome? (Check
with Project Manager)
·
How are you going to measure progress? (MTTI / MTTR)
·
How often will you review progress towards the
outcome? (Daily QA)
·
How will you know you’ve achieved your outcome?
(Closure on Ticket/Project with RCA & Required information)
Create SMART KPI:
·
Is your objective Specific?
(Yes/No)
·
Can you Measure
progress towards that goal? (MTTI / MTTR & QA)
·
Is the goal realistically Attainable? (Yes/No)
·
How Relevant
is the goal of your organization? (Discuss with Project Management)
·
What is the Time-frame
for achieving this goal? (Every project should have ETA & New
Implementation should be defined as new Scope of Work)
Artefacts
/ Record
What: We have
different types of assets for our Knowledge on Arlo like SOP, Knowledge Base,
Inventory, and Confluence to maintain the Artifacts / Record.
Why: This is very an important part of SLA as a team should be on the same page and follow the same steps to
achieve perfection and no gaps for any kind of disturbance.
How: There are 2
Categories as mentioned below:
1.
Pre-define
·
Some repetitive tasks which are done on daily bases
should have all required documents in place.
· The team should follow the standard decided procedures to
commit the goal.
·
Every single member of the team should be on the same
page.
2.
Post-define
·
There are many new incoming Projects which required
POC.
·
Such projects should be always channelled through
Management to define the Scope of Work.
·
Once the Project is underlaying our scope, should do
some POC around taking DR in concern and Backups.
· Once we have a confident POC, we need to create an SOP, share it with the stockholder, and verify it.
· Once approved upload to Confluence.
Tapan Patni
Email: tapanpatni58@gmail.com
Linkedin: https://www.linkedin.com/in/tapan-patni
BlogSpot: https://tapanpatni58.blogspot.com
ReplyDeleteGood Information. Keep posting