Razer banner
Razer logo

Site Reliability Engineer (Mid/Senior)

Razer logo Razer
🇸🇬 Singapore
Contract Full Time
Experience Level Senior (5+ years)
Published Date

Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.

Job Responsibilities :

We are looking for Site Reliability Engineers (SRE) to join our AI Software team. In this role, you will ensure the reliability, performance, scalability, and operational excellence of AI products, model-serving infrastructure, and backend API systems. You’ll work closely with software engineers, AI teams and release teams to automate operations, enhance observability, and streamline deployments in a cloud-scale environment. This role is ideal for someone who enjoys building resilient systems, solving complex infrastructure problems, and supporting AI workloads in production.

Essential Duties and Responsibilities 

  • Administer, monitor, and manage cloud-scale production environments for AI model APIs, backend services, and high-traffic web systems serving global users. 

  • Design and implement fault-tolerant, autoscaling cloud architectures tailored for AI inference workloads, including GPU-based environments and software products. 

  • Build automated self-recovery systems to ensure high availability, rapid failover, and cost-efficient resource usage for all software products. 

  • Manage and monitor AI model-serving platforms, inference engines, vector databases, data pipelines, software applications 

  • Ensure reliability and uptime for experimental, production AI software environments. 

  • Implement and maintain comprehensive monitoring, logging, and alerting for all AI and backend services. 

  • Reduce MTTR through actionable alerts, runbooks, and automated diagnostics. 

  • Automate infrastructure using IaC (Terraform/CloudFormation) and configuration management. 

  • Improve release workflows and integrate with QA for smooth handoff to Release Candidate testing. 

  • Work closely with software engineering, ML engineering, and release management to enhance operational procedures, deployment processes, and incident response workflows. 

  • Participate in on-call rotations, incident reviews, and continuous improvement initiatives.. 

Pre-Requisites :

Qualifications 

  • 4+ years of relevant experience in SRE, DevOps, infrastructure engineering, or cloud operations 

  • Experience operating production services with significant availability or scaling demands. 

  • Strong knowledge in Web Technologies such as HTTP, REST, SSL, Load Balancers, Web Proxies (NGINX) 

  • Comfortable with Linux and Docker administration 

  • Basic knowledge in AWS, CI/CD (Jenkins), IaC (Terraform), Container Orchestration (AWS ECS or K8s), Version Control (Git), Database (mySQL, noSQL) 

  • Strong ability to code and script ( preferably Bash scripting and Python) 

  • Ability to use or quickly pick up a wide variety of open source technologies and automation tools 

  • Understanding of GPU-based workloads and resource scheduling. 

  • Familiarity with vector databases, embeddings, and inference pipeline 

  • Comfort with frequent, incremental code testing and deployment 

  • Must have good analytical skills to debug deployment problems without taking help from developers 

  • Deep hands-on technical expertise and problem-solving skills 

  • Ability to work in a collaborative, technically challenging environment with rapidly changing requirements. 

Education & Experience 

  • Has a Bachelor’s or Master’s degree in computer science, AI or similar discipline from an accredited institution 

Travel Requirements 

  • Role based in Singapore office and may require up to 1 travel trip per year. 

Are you game?

Featured Jobs
More Jobs
Latest News
More News