DevOps Engineer
level ai
Job Summary
Level AI is an Enterprise SaaS startup focused on building AI tools to augment human capabilities, specifically within contact centers. Founded in 2019 and headquartered in Mountain View, California, Level AI is a Series C startup. Its AI-native platform utilizes advanced technologies like Large Language Models to extract deep insights from customer interactions, thereby enhancing customer experience and driving growth. The DevOps Engineer will be responsible for designing, building, and enhancing ML system infrastructure, tracking performance, evaluating tools, and collaborating with the AI team to bring ML projects to production.
Must Have
- Design, build, and develop/enhance state of art machine Learning system infrastructure (cloud and on-premise) core components.
- Architect platforms to create, train and deploy ML models.
- Build operating dashboards and charts to track system errors, performance and enable root cause analysis.
- Identify gaps and evaluate relevant tools and technologies to improve processes and systems.
- Collaborate with the AI team to drive ML projects from conception to completion and production monitoring.
- 2-4 years of meaningful work experience in DevOps handling complex services.
- Strong troubleshooting skills to keep services highly available.
- Strong expertise and experience with Google Cloud Platform (GCP), Docker, Kubernetes, CI/CD, and Jenkins.
- Extensive experience in designing, implementing, and maintaining infrastructure as code, preferably using Terraform.
- Create and maintain deployment manifest files for microservices using HELM.
- Strong expertise with deployment at scale on a Kubernetes cluster via HPA.
- Broad technical background and experience with architecture, design, and operations of cloud solutions and meeting security compliance requirements.
- Monitoring system health, ensuring security, scalability, and reliability.
- Design, implement, and maintain observability, monitoring, logging, and alerting using tools like Prometheus, Grafana, Promtail, Loki, and Datadog.
Good to Have
- LLMOps or MLOps experience.
Perks & Benefits
- Market-leading compensation.
Job Description
Level AI (thelevel.ai) is an Enterprise SaaS startup. Our vision is to build AI tools that augment, not replace, humans. Our first market is in contact centres.
Level AI was founded in 2019 and is a Series C startup headquartered in Mountain View, California. Level AI revolutionises customer engagement by transforming contact centres into strategic assets. Our AI-native platform leverages advanced technologies such as Large Language Models to extract deep insights from customer interactions. By providing actionable intelligence, Level AI empowers organisations to enhance customer experience and drive growth. Consistently updated with the latest AI innovations, Level AI stands as the most adaptive and forward-thinking solution in the industry.
Responsibilities:
- Design, build, and develop/enhance state of art machine Learning system infrastructure (cloud and on-premise) core components and architect platforms to create, train and deploy ML models.
- Build operating dashboards and charts to track system errors, performance and enable root cause analysis.
- Identify gaps and evaluate relevant tools and technologies as needed to improve processes and systems, leveraging open-source and cloud computing technologies to build effective solutions.
- Collaborate with the AI team to drive ML projects from conception to completion and production monitoring.
Requirements:
- Bachelor's or above with a good academic background.
- 2-4 years of meaningful work experience in DevOps handling complex services.
- Strong troubleshooting skills to keep our services highly available.
- Strong expertise and experience with Google Cloud Platform (GCP), Docker, Kubernetes, CI/CD, and Jenkins.
- Extensive experience in designing, implementing, and maintaining infrastructure as code, preferably using Terraform.
- Create and maintain deployment manifest files for microservices using HELM.
- Having LLMOps or MLOps experience is a bonus.
- Strong expertise is required with deployment at scale on a Kubernetes cluster via HPA.
- Broad technical background and experience with architecture, design, and operations of cloud solutions and how to meet security compliance requirements.
- Monitoring system health, ensuring security, scalability, and reliability.
- Design, implement, and maintain observability, monitoring, logging, and alerting using tools like Prometheus, Grafana, Promtail, Loki, and Datadog.
Compensation : We offer market-leading compensation, based on the skills and aptitude of the candidate.
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.