Staff Cloud Operations Engineer - San Jose HQ
extreme network
Job Summary
Extreme Networks is seeking a Staff Cloud Operations Engineer for its San Jose HQ. This role involves managing and maintaining ExtremeCloud IQ service infrastructure across AWS, GCP, and Azure, participating in continuous cloud service operations, and troubleshooting production issues. The engineer will also drive root cause analysis, communicate with Dev/QA, participate in release deployments, and design and implement deployment automation for Kubernetes-based microservices. The position focuses on improving service availability, scalability, and monitoring, and contributing to cloud security and compliance.
Must Have
- Manage and maintain ExtremeCloud IQ service infrastructure in AWS, GCP & Azure.
- Participate in continuous cloud service operations with US, China and India teams.
- Troubleshoot and follow up on production infrastructure / application related issues.
- Driving root cause analysis and resolution.
- Communicate with Dev/QA as well as external carriers to resolve and prevent issues.
- Participate in release deployment, system maintenance and cloud expansion.
- Design and implement deployment automation platform for Kubernetes based microservices.
- Improve service availability and scalability through tuning, automation, tools, and process.
- Analyze service performance, identify bottleneck and provide actionable improvement plans.
- Improve service monitoring coverage, accuracy and efficiency.
- Participate in cloud security and compliance implementation.
- BS level technical degree required; Computer Science or Engineering background preferred.
- 8+ years of experience in a CloudOps / DevOps role.
- Hands on experience with AWS or any public cloud (Azure, GCP etc).
- Knowledge of Linux, security and networking fundamentals.
- Working knowledge of container-based architecture and deployment (Docker, Kubernetes).
- Working knowledge of deployment automation development (Ansible, Terraform, Helm).
- Experience in diagnosing and resolving complex application problems.
- Working knowledge of Elasticsearch, PostgreSQL, Redis Ignite and RabbitMQ.
- Experience with monitoring tools (Nagios, Kibana, Prometheus).
- Strong follow-through and initiative to stay with issues until they are resolved.
- Comfortable working within a distributed team located in multiple time zones.
Good to Have
- Experience with cloud security and compliance implementation is a plus.
Job Description
There has never been a better time to join Extreme, after three acquisitions extending our portfolio and go to market strategy, we have seen enormous opportunity and growth within the regions. Aside from being a Technology Leader in the Gartner Magic Quadrant, we also adamantly promote an internal culture that truly embraces diversity, inclusion and equality in the workplace. Having Diversity and Inclusion as part of our core values and beliefs, we’re proud to foster an environment where every Extreme employee can thrive because of their differences, not despite them.
Responsibilities:
- Manage and maintain ExtremeCloud IQ service infrastructure in AWS, GCP & Azure.
- Participate in continuous cloud service operations with US, China and India teams.
- Troubleshoot and follow up on production infrastructure / application related issues.
- Driving root cause analysis and resolution.
- Communicate with Dev/QA as well as external carriers to resolve and prevent issues.
- Participate in release deployment, system maintenance and cloud expansion.
- Design and implement deployment automation platform for Kubernetes based microservices.
- Improve service availability and scalability through tuning, automation, tools, and process.
- Analyze service performance, identify bottleneck and provide actionable improvement plans.
- Improve service monitoring coverage, accuracy and efficiency.
- Participate in cloud security and compliance implementation.
Qualifications:
- BS level technical degree required; Computer Science or Engineering background preferred.
- 8+ years of experience in a CloudOps / DevOps role.
- Hands on experience with AWS or any public cloud (Azure, GCP etc).
- Knowledge of Linux, security and networking fundamentals.
- Working knowledge of container-based architecture and deployment (Docker, Kubernetes.)
- Working knowledge of deployment automation development (Ansible, Terraform, Helm).
- Experience in diagnosing and resolving complex application problems.
- Working knowledge of Elasticsearch, PostgreSQL, Redis Ignite and RabbitMQ.
- Experience with monitoring tools (Nagios, Kibana, Prometheus)
- Experience with cloud security and compliance implementation is a plus.
- Strong follow-through and initiative to stay with issues until they are resolved.
- Comfortable working within a distributed team located in multiple time zones.