Senior Site Reliability Engineer
About Ubisoft & Shanghai Studio:
Ubisoft’s 19,000 team members, working across more than 40 locations around the world, are bound by a common mission to enrich players’ lives with original and memorable gaming experiences. Their dedication and talent has brought to life many acclaimed franchises such as Assassin’s Creed, Far Cry, Watch Dogs, Just Dance, Rainbow Six, and many more to come.
Ubisoft is an equal opportunity employer that believes diverse backgrounds and perspectives are key to creating worlds where both players and teams can thrive and express themselves. If you are excited about solving game changing challenges, cutting edge technologies and pushing the boundaries of entertainment, we invite you to join our journey and help us create the unknown.
Created in 1996, Ubisoft Shanghai studio, is a vibrant and exciting place where our 600+ talents get opportunities to either co-develop great AAA blockbuster games, create cutting-edge online games or produce fun mobile games.
The Site Reliability Engineer (SRE) is responsible of Ops and development tasks such as level 4 support and the implementation of highly scalable Game infrastructure. The SRE is working as the Infra services integrator that enables the production to build Games using principals of cloud-Native, DevOps and continuous Delivery. The SRE has a good development background with knowledge of infrastructure and automation.
The main and routine tasks of this position are to:
- Designing and/or implementing a highly scalable Cloud and Bare Metal server and network infrastructure
- Share responsibility and ownership of game functions and services with developers who create them
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
- Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Practice sustainable incident response and blameless postmortems.
- Ability to debug and optimize code and automate routine tasks (“toil”)
- Consulting on the game's software and data architecture to ensure maximum infrastructure scalability
- Ensuring reliability and consistency of game data
- Work with developers to develop adequate monitoring and monitor system events to ensure health, maximum system availability and service quality
- Assist in evaluating new requirements, technical design and standards
- Reduce the cost of failure for changes
- Define prescriptive ways to measure reliability
"Here’s what you do when someone breaks something or finds something very difficult to debug: You say thank you. Thank you for finding this edge case. Thank you for highlighting this overcomplicated part of our system. Thank you for pointing out this gap in our docs. And then you go make it so nobody can break it the same way again."
A baccalaureate degree or equivalent experience in Computer Information Systems, Computer Science, Mathematics or a related field.
- 2+ years of experience with software development or 5+ years of automation focused system administration with Hybrid hosting solutions.
- Experience in one or more of the following: C, C++, Java, Python, Go, Perl or Ruby.
- Self-driven, be slightly paranoid about system stability
- Be able to teach fundamental principles to other engineers/experts.
- Skill in developing techniques and methodologies to resolve unprecedented problems or situations
- Ability to make complex information accessible to non-technical people
- In-depth knowledge of Linux system internals and operating system design
- In-depth understanding of Public Cloud providers and Openstack platform
- Proficient knowledge in orchestration systems such as Kubernetes
- Proficient knowledge in relational database systems like MySQL
- Proficient knowledge in document storage systems like MongoDB
- Infrastructure orchestration with Terraform
- In-depth understanding of Configuration Management systems like Saltstack, Chef & Puppet & Ansible is an asset
We have salaries to motivate you, bonuses for your performances, medical services to keep you safe and sound, meal tickets to use them wherever you want and free access to relaxing and fitness room.
But most of all, we guarantee you’ll enjoy our atmosphere and working environment.
is a leading creator, publisher and distributor of interactive entertainment and services, with a rich portfolio of world-renowned brands, including Assassin’s Creed, Just Dance, Tom Clancy’s video game series, Rayman, Far Cry and Watch Dogs. The teams throughout Ubisoft’s worldwide network of studios and business offices are committed to delivering original and memorable gaming experiences across all popular platforms, including consoles, mobile phones, tablets and PCs.
Come and join our team of over 1400 professionals and help us create highly appreciated interactive entertainment products!