This job listing expired on Sep 24, 2020
Responsibilities:
- Develop, mentor, and oversee the operations of our infrastructure team managing cloud technologies and automation development
- Provide technical leadership to the teams to
- Design, develop and manage services deployment pipelines in collaboration with services development teams
- Design tools and processes to ensure high availability/reliability of various applications
- Develop monitoring and alerting solution to track critical service operations metrics and report deviations
- Design and tune service deployment parameters for optimal performance
- Track capacity and resource consumption, forecast capacity requirements and cost
- Proactively identifies potential technical challenges, and ensures that the team makes solid, pragmatic technical decisions
- Build a team culture to aim for high service availability, scalability and observability goals
- Stay keenly aware of engineering processes and tooling. Actively seek ways to improve them
- Work with other engineering teams on automation initiatives, decisions and troubleshooting
- Define and report on metrics relating to SLAs and uptime
Qualifications:
- 15+ years working in the software industry with at least the last 7+ years' experience in building and managing teams / shipping enterprise software through multiple releases
- 15+ years working with Linux (Debian/RHEL based). Extensive knowledge of systems mgmt best practices and fundamentals
- Deep working experience on VMware virtualization platforms or others like Amazon Web Services, Google Cloud etc.
- 8+ years of experience in managing production-critical infrastructures and DevOps environments
- 5+ years of work experience in Site Reliability/Infrastructure Engineering for a team operating distributed systems/cloud infrastructure
- Kubernetes / Mesos deployment and management experience - ECS, EKR and/or KOPS deployments
- Is a strong self-starter, operationally-focused, has a holistic data perspective, is a problem-solver
- Knowledgeable in network, firewall, and security best practices
- Extensive experience with infrastructure automation and monitoring distributed systems
- Demonstrated ability to understand and solve deep technical issues
- Prior experience with cloud migrations a plus
- Strong software development and project management fundamentals