At Blizzard Entertainment, our Site Reliability Engineers (SREs) use systems expertise combined with software engineering patterns to help define, create, and support the architecture, build systems, orchestration, and operations of services across the business. The role is comprised of talented engineers that are focused on evangelizing reliability-as-a-feature through monitoring, service-level objectives, automation, everything-as-code, and testing.

Blizzard's games and platforms reach a global audience of passionate gamers. The scale is massive and the challenges are very real, but wise application of technology is the answer to keep it all running reliably with minimal oversight. Our Site Reliability Engineers are at the heart of this work, working directly with the engineering teams from idea to launch to deliver the most epic (and reliable!) experiences... ever.

As an SRE on Team 2, you'll...

Embed and work closely with the World of Warcraft team to increase velocity, improve reliability, and ensure rock solid product and patch launches!
Using a combination of Ansible, Jenkins, Puppet, and Terraform, to orchestrate WoW's infrastructure and deployments across our on-prem OpenStack cloud, GCP, and Alibaba.
Wield Python to iteratively improve WoW's automation framework integral to many facets of the infrastructure.
Level up observability throughout the service architecture to improve incident response times and help the WoW team identify critical bugs quickly.

As an SRE at Blizzard, you may find yourself...

Being part of an on-call rotation to assist finding a resolution during incidents
Hosting blameless postmortems to share learnings, discover gaps, embrace transparency, and improve reliability across our services
Building positive and collaborative relationships across the company
Employing your systems knowledge to triage problems and tune resource usage
Championing automation to reduce toil and increase development velocity
Helping define and instrument Service-Level Objectives to ensure epic player experiences
Leveraging Configuration Management to build and maintain consistency across services
Building Terraform configs to manage infrastructure in public and private clouds
Supporting and improving build pipelines with Jenkins, Argo, and/or Spinnaker
Adopting Containers and Kubernetes for new and existing services
Applying everything-as-code methodologies across configuration, infrastructure, orchestration, and elsewhere

You may succeed in this role if you...

Love to solve novel and exciting problems
Dislike solving the same problems over-and-over- so you automate or eliminate them
Are inspired to make everyone's job easier by improving workflows
Are comfortable digging through metrics, logs, and whatever else is available to triage and fix an incident at any time
Strive to be better, smarter, and faster tomorrow than you are today
Enjoy trying new technologies to improve what we're doing today
Are okay using older technologies that may not be perfect, but are good enough and low maintenance
Naturally spread the philosophies and practices of DevOps to others
Like to collaborate with others to solve problems, share knowledge, and provide feedback
Can self-assess the needs of a system or team, and make a case to prioritize that work
Relish working with software, network, cloud, and systems engineers to solve problems across all tiers of the stack
Help your peers succeed as much as you can

Types of projects you may work on...

Managing services and infrastructure supporting Blizzard's incredible platforms and games
Designing new code and build pipelines
Supporting current CI/CD implementations
Defining the future of running services for our platforms and games with container orchestrators
Supporting our massive global data platforms across multiple clouds
Working closely with our incubation teams to help define how future products should operate
Integrating monitoring and logging with systems to improve observability and enable Service-Level Objectives

Areas of Expertise for an SRE at Blizzard

SREs at Blizzard are expected to become experts in the technologies used by the teams they are working with. Below is a non-exhaustive list of technologies SREs may be exposed to:

Service-Level Objectives (SLI, SLO, SLA, Error Budget, Burn Rate)
Distributed Systems (system/software architectures, micro-services, high-availability)
Configuration Management (Puppet, Terraform, Ansible)
Container Computing (Docker, Kubernetes, Service Mesh)
Cloud Services and Architecture (AWS, GCP, OpenStack)
Distributed Message Bus (RabbitMQ, Kafka)
Proxies and Load Balancing (Nginx, HAProxy, ELB)
Monitoring (Kibana, Grafana, Elasticsearch)
Logging (Splunk, SysLog, ELK Stack)
Source Control (GitHub Enterprise, Perforce, SVN)
CI/CD (Jenkins, Argo, Spinnaker)
Linux (bash, debugging, tuning, performance measuring)
Networking (triaging, packet loss, routing)
Programming (Python, C#, C++)

Expectations of a SRE at Blizzard:

Familiar knowledge of all areas of expertise, general knowledge of 5 areas of expertise, deep knowledge of 2 areas of expertise
General knowledge of all areas of their partner team's systems
Capable of sharing ideas and technology to their peers in a clear and effective way
Builds strong relationships with their immediate team and peers
Considers others’ interests as well as their own
Creates new technical documentation on their own
Demonstrates deep understanding of the services they support and their goals
Expands knowledge on SRE best practices and anti-patterns