This job listing expired on Nov 27, 2019
Tweet

NVIDIA Cloud Gaming (GeForce Now) is looking for a passionate member to join our Engineering Productivity Team as an Incident Manager. In this role, you will play a significant part in helping to craft and guide the future of Cloud Gaming. This role will require you to demonstrate a blend of solid critical thinking, problem solving, and attention to detail with the ability to achieve results in a highly matrixed global working environment.

What you’ll be doing:

  • You develop and use metrics (Key Performance Indicators) to validate success and identify improvements of the service. You will coordinate teams during incidents and drive the triage process in realtime to drive down the Mean Time To Repair (MTTR).
  • You will be responsible for the creation of indicate playbooks and lead the Root Cause and Corrective Action (RCCA) process in the organization to ensure we learn from incidents and not to repeat them.
  • You will improve problem management methods and procedures and ensure that they are used for efficient and prompt analysis of problems and preventative measures.
  • You will apply automation and tooling to common issues that we can automate away from managers and engineers.
  • You will also be take part in operational duties such as automating deployments, and monitoring deployed application performance.
  • Working in a multi-functional team with Client Software, Platform Software, Infrastructure, Operations, Customer Support and Marketing to drive positive outcomes for new features.

Additionally you will:

  • Review new features so we have the operational knowledge to support at production cloud scale 24x7.
  • Lead the war room during a critical incident and have the authority to sign off on the corrective actions.
  • Design escalation plans for teams to ensure the right people are on call to tackle critical issues.
  • Drive for fault tolerance and reliability with a strong focus on velocity through clean contract design and implementation.
  • Identified potential risks, provide guidance on solutions to address the risks and anticipated points of resistance, and develop specific plans to mitigate or address the concerns.

What we need to see:

  • BA in CS or equivalent practical experience
  • Minimum of 3+ years of operation services or project or program experience or equivalent.
  • Demonstrate understanding of cloud design in the areas of virtualization and global infrastructure, distributed systems, load balancing and security.
  • Experience building/operating cloud scale software and services.
  • Excellent interpersonal, and written communication skills required.
  • A track record of creating processes to solving complex problems with elegant solutions.
  • Deep understanding of the Incident and Problem Management
  • Experience with project management tools such as JIRA.
  • Experience with version control tools (Git or Perforce).
  • Experience with Kibana or Grafana to build dashboards to measure critical metrics and give us insights to our services

Ways to stand out from the crowd:

  • Attested ability to handle critical incidents.
  • Show previous use of testing automation and reporting.
  • Use of scripting languages.
  • Experience with Continuous Integration and Continuous Delivery
  • Able to successfully handle a dynamic workload with minimal intervention from Management
  • A desire to automate processes and to look for opportunities that can increase software engineering cadence.
  • Outstanding communication skills – written, presentation skills, etc.
  • Possesses the aptitude to quickly understand work processes and flows in various business units.
  • Strong Data analytic skills working with Data Scientist and Machine learning authorities to predict problems and probability associated with them.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most brilliant and talented people in the world working for us. If you're creative, fun and autonomous, we want to hear from you!

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.