${ alert.message }}
${ alert.message }}
User Profile
We need some information before you can continue.
Share Job
Copy the link below to share this job.
Wildlife Studios

Manager Site Reliability Engineer

${ timestamp }} · 
Wildlife Studios
πŸ‡¦πŸ‡· Buenos Aires

About the team

An engineer in our team works with a global scale infrastructure and has great impact in millions of players. To guarantee the best experience possible, we count with several Kubernetes clusters spread around the world and connected to each other. We are in the cutting edge of open-source infrastructure technology, we adopted Kubernetes in production little after the project was launched and today we use technologies such as eBPF and Cilium in our network stack.

In BA, we'll be focused on developing a brand new commercial product with the following challenges:

  • Machine learning based service handling +1 Million QPS, 1-digit ms response time.
  • Cloud-native platform, Data Engineering, Data Science, Information Security COE.
  • Real-time advertising inventory bidding exchange.
  • Creatives statistical testing & analytics tools.
  • Machine learning powered mediation layer.

About the role

Wildlife Studios is searching for a manager infrastructure/site reliability engineers to join our team! We seek for a profile with solid programming, network and operational systems knowledge. Since we are always looking for new tools and technologies that better solve our problems, we value professionals that like to learn new things, are autonomous and proactive to bring and implement their ideas.

We'll need you to understand our systems flows, diagnose problems in production environment, identify points of improvement and automation, and guarantee that we have the necessary infrastructure to create the best games in the world.

More about you:

  • Player focused. We are player oriented and infrastructure has a great impact in their experience. You have empathy with our players and focus on ensuring they have an amazing experience. You aim for a top-level infrastructure, guaranteeing the highest availability possible.
  • Automation is key to scaling. We look for engineers that have a history of projecting and executing automation projects in order to get rid of any manual and repetitive tasks.
  • Calm and pragmatism. When everything seems to be falling apart around you, you have a plan and keep calm.
  • Bleeding edge. You are curious and like to study new technologies, test new solutions and measure the impact brought by changes. We want to ensure we are using the best stack possible

What you’ll do:

  • Lead and contribute to the design of CI/CD pipelines, using best practices around automation, pushing changes that improve reliability and velocity.
  • Own end-to-end availability and performance of key services and build automation to prevent problem recurrence. Automate response to all non-exceptional service conditions.
  • Provide mentorship and training on CI/CD pipelines and processes. Drive education and knowledge transfer of design patterns.
  • Provide leadership and prioritization to the experienced team of 4-5 site reliability engineers and help make the key trade-offs required to keep the team working most effectively.
  • Drive innovation through research, setting the direction and standards in the use of technical solutions.
  • Have an enormous impact working closely with teams on our organization, be an advocate for SRE principles.
  • Drive collaboration and agreement across disciplines and geographically dispersed teams, ensuring they are using the best practice
  • Active participation in recruiting and process refinement. You’ll help to improve internal practices and standards to bring new candidates to your team and the company.

What you'll need:

  • 4+ years of experience as an SRE.
  • Wiliness to be a hands-on contributor.
  • Strong background in programming or experienced systems administrator.
  • Experience of defining KPI's/SLA's and managing teams to excel at these.
  • Experience in public cloud, we have a large significant presence in AWS.
  • Experience with Kubernetes and/or different container orchestration platforms
  • Experience with incident/response and on call processes and tools.
  • Challenge the status quo and work with the team to design simple and flexible solutions

We welcome people from all backgrounds who seek the opportunity to help build the best gaming company, where everyone thrives!