Site Reliability Engineer
Founded in 2012, Cloud Imperium Games creates cutting-edge videogames that defy expectations. We’re currently developing Star Citizen, a record-breaking multiplayer online space sim, and Squadron 42, a cinematic single-player adventure set in the same universe. Join us as we break boundaries and make videogame history.
Site Reliability Engineers (SRE’s) split themselves between operational and development concerns. They empower internal customers with a rich feature set, high availability, and stellar performance level to pursue the company’s missions. SRE’s should be able to be active participants in projects designed to achieve these goals, with a keen ear for hearing our internal customer’s needs – serving as an effective advocate and implementer of solid and reliable systems which meet them.
What we look for in a Site Reliability Engineer*
- Ability to maintain positive relationships with internal customers, representing their needs without judgement.
- Medium experience with programming in at least one of the following languages: Python, Golang, or Scala.
- Have learned at least one of any other programming language – either in the list above or otherwise.
- Willingness to learn new programming languages as new tasks manifest which require them.
- Some experience with programming to published API’s – or with finding appropriate libraries to offload that interaction.
- Strong operational experience with an OS-agnostic mindset: i.e., administrative experience with Linux, but willing to support Windows when required.
- Real-world experience with Cloud Computing – basic interactions with Amazon Web Services, Google Cloud Computing, Azure
- Some familiarity with running processes in modern virtual abstractions: running and/or building containers, Kubernetes, deploying and configuring Helm charts
Some familiarity with high-level network concepts: IP networks and addressing, TCP basics, typical firewalling goals
What you'll be doing:
- Run the production environment by monitoring availability and taking a holistic view of system health
- Due to the mission-critical nature of the SRE group, all SRE’s should be ready to respond and assist during off-hours.
- Help build software, systems, and services to manage infrastructure and applications
- Improve reliability, quality, and time-to-market of company software solutions
- Measure system performance with the goal of staying ahead of customer needs and striving to continually improve
- Provide operational support and engineering for distributed software applications both inside Publishing and Development
- Partner with Development teams to improve services through testing and release procedures
- Collaborate in system design, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Plan and execute migrations of existing systems to more reliable infrastructure
- Document outage policy and response guidelines
Covid-19 Hiring Update: We’ve transitioned to a work-from-home model and we’re continuing to interview and hire during this time. This role is expected to begin as a remote position. We understand each person’s circumstances may be unique and will work with you to explore possible interim options
CIG Diversity Statement: CIG is a global company, staunchly committed to cultivating a culture and workplace that celebrates all backgrounds, lifestyles, and perspectives. Together, we are creating a space where authentic recognition, appreciation, and understanding of the importance of diversity is fostered by everyone. As an Equal Opportunity Employer, we strive to build a team that represents all walks of life, and we want every employee to bring all the things that make them unique to the work environment. The universe is as vast and varied as the people in it, and it’s our differences that make it special.