This job listing expired on Sep 24, 2020
Tweet

Responsibilities:

  • Develop, mentor, and oversee the operations of our infrastructure team managing cloud technologies and automation development
  • Provide technical leadership to the teams to
    • Design, develop and manage services deployment pipelines in collaboration with services development teams
    • Design tools and processes to ensure high availability/reliability of various applications
    • Develop monitoring and alerting solution to track critical service operations metrics and report deviations
    • Design and tune service deployment parameters for optimal performance
    • Track capacity and resource consumption, forecast capacity requirements and cost
  • Proactively identifies potential technical challenges, and ensures that the team makes solid, pragmatic technical decisions
  • Build a team culture to aim for high service availability, scalability and observability goals
  • Stay keenly aware of engineering processes and tooling. Actively seek ways to improve them
  • Work with other engineering teams on automation initiatives, decisions and troubleshooting
  • Define and report on metrics relating to SLAs and uptime

Qualifications:

  • 15+ years working in the software industry with at least the last 7+ years' experience in building and managing teams / shipping enterprise software through multiple releases
  • 15+ years working with Linux (Debian/RHEL based). Extensive knowledge of systems mgmt best practices and fundamentals
  • Deep working experience on VMware virtualization platforms or others like Amazon Web Services, Google Cloud etc.
  • 8+ years of experience in managing production-critical infrastructures and DevOps environments
  • 5+ years of work experience in Site Reliability/Infrastructure Engineering for a team operating distributed systems/cloud infrastructure
  • Kubernetes / Mesos deployment and management experience - ECS, EKR and/or KOPS deployments
  • Is a strong self-starter, operationally-focused, has a holistic data perspective, is a problem-solver
  • Knowledgeable in network, firewall, and security best practices
  • Extensive experience with infrastructure automation and monitoring distributed systems
  • Demonstrated ability to understand and solve deep technical issues
  • Prior experience with cloud migrations a plus
  • Strong software development and project management fundamentals