As Service Reliability Supervisor, you will report to the Service Reliability Manager and work closely with the Network Operations Center team globally to establish and maintain a high-performing and highly available game service for players around the world. You’ll help manage a team at Riots LA campus that monitors and supports all aspects of production environments, development environments, and general system needs. Your management skills and understanding of operations will help you diagnose and communicate potential issues to Rioters and the community, improving the quality of the player experience.
Some of the challenges that you’ll encounter include overseeing incident management of a 24/7/365 team in an environment where every minute counts. You’ll provide guidance during stressful situations and be the paragon of steadfast decision-making. You’ll also help evolve the strategic direction, implement tactical goals, and maintain the health of the team.
Responsibilities
- Ensure that your direct reports are meeting team and individual Measurements.
- Be the escalation point for the Network Operations Center which responds, mitigates, and resolves incidents.
- Maintain performance of the team through hiring, training, assigning and evaluating work, and taking corrective action where necessary.
- Guide team members’ technical and professional growth.
- Ensure that the team is operating in compliance with local laws and regulations.
- Plan, design, and implement solutions that support NOC operations.
- Contribute to the strategic direction of growth and capacity planning established by the Global NOC manager.
- Develop and collaborate on policies and processes for all Network Operations Center environments with the NOC Leadership team.
- Manage NOC Service level agreement commitments with product owners and service teams.
- Coordinate communication, training, and work over a global 24/7/365 team.
- Establish plans and policies for business disaster recovery.
Required Qualifications
- 4 years experience working in technology operations.
- 2 years experience leading a team and managing for performance.
- Strong verbal and written communication skills.
- Understanding of basic technologies around running an online service and the advancements the industry is making
- Knowledge in general networking and system triage and configuration, understanding metrics, and distill essential action points taken during incidents from a technical perspective.
Desired Qualifications
- ITIL Foundation v4 certification
- Degree in information technology, information system, technical operations, or equivalent experience.
- Experience with SRE (Site Reliability Engineering)
- Experience in time critical/multiple data center supported NOC that is globally distributed
- Gamer empathy for understanding impact of outages
- Experience managing teams through transitional change
- Strong communication with the team, cross site, stakeholders, service owners, senior leadership