Every day, tens of millions of people come to Roblox to explore, create, play, learn, and connect with friends in 3D immersive digital experiences– all created by our global community of developers and creators.
At Roblox, we’re building the tools and platform that empower our community to bring any experience that they can imagine to life. Our vision is to reimagine the way people come together, from anywhere in the world, and on any device. We’re on a mission to connect a billion people with optimism and civility, and looking for amazing talent to help us get there.
A career at Roblox means you’ll be working to shape the future of human interaction, solving unique technical challenges at scale, and helping to create safer, more civil shared experiences for everyone.
We're looking for a Senior Site Reliability Operations Engineer, who is passionate about solving problems, to join our Site Reliability Team in a critical role within our Reliability Response division. You will have a minimum of 5 years' experience, , will stand out for their exceptional abilities in handling incidents and will working effectively within a complex, distributed environment full of continuous changes. You are passionate about getting to the root of an issue to guide long-term solutions will be important to their success in this role. You will report to the Senior Manager, Reliability.
You Will:
Lead and manage system incidents, aiming to minimize downtime.
Collaborate cross-functionally to troubleshoot and resolve sophisticated technical challenges.
Guide the implementation of incident management protocols, ensuring fast and effective responses to minimize impact.
Leverage coding skills to automate daily routine tasks and enhance system efficiency.
Continually monitor system health, performance, and capacity, proactively addressing potential issues.
Conduct comprehensive post-mortem analysis to ascertain the root cause of incidents and formulate corrective measures.
Contribute substantially to the design and enhancement of system architecture to boost reliability and performance.
Serve in the Incident Manager On-Call rotation
Mentor junior team members
You Have:
At least 5 years of experience in a comparable role within a Site Reliability Team.
Advanced knowledge of system and network infrastructure protocols, and standards.
Demonstrated ability in managing, troubleshooting, and resolving incidents in distributed environments.
Familiarity with at least one scripting or programming language to automate routine tasks (Python, Golang, or similar languages preferred).
Experience solving problems, with a talent for working resiliently and effectively
Communication experience with an ability to distill complex technical issues into clear and concise language.
A sense of responsibility and ownership, while taking a proactive approach
Practical experience with HachiCorp tools.
For roles that are based at our headquarters in San Mateo, CA: The starting base pay for this position is as shown below. The actual base pay is dependent upon a variety of job-related factors such as professional background, training, work experience, location, business needs and market demand. Therefore, in some circumstances, the actual salary could fall outside of this expected range. This pay range is subject to change and may be modified in the future. All full-time employees are also eligible for equity compensation and for benefits.
Annual Salary Range
$201,410—$254,510 USD
You’ll Love:
Industry-leading compensation package
Excellent medical, dental, and vision coverage
A rewarding 401k program
Flexible vacation policy
Roflex - Flexible and supportive work policy
Roblox Admin badge for your avatar
At Roblox HQ:
Free catered lunches five times a week and several fully stocked kitchens with unlimited snacks
Onsite fitness center and fitness program credit
Annual CalTrain Go Pass
Roblox provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.