##### Job Description
The AI, Data & Research unit is at the forefront of CyberArk’s innovation, building data-driven, ML-powered, and intelligent security solutions. We are looking for a passionate engineer to join our team of seasoned ML engineers. You will play a critical role in building a multi-tenant PaaS for ML pipelines and inference, ensuring scalability, reliability, and security. You will take ownership of critical platform components, drive best practices, and mentor other engineers.
- Design, build, and maintain infrastructure-as-code using Python and AWS services for deployment.
- Architect, build, and manage Docker-based services.
- Lead the design and implementation of solutions using AWS services such as SageMaker, Lambda, Step Functions, SageMaker Pipelines, Batch Transform, and Real-Time Endpoints.
- Enhance and maintain CI/CD pipelines (Jenkins and shared libraries).
- Ensure multi-tenant security and tenant isolation across the platform.
- Define and implement observability and monitoring practices with Datadog and other tools.
- Collaborate closely with Data Scientists, Data engineers, MLEs, Product Managers, and other engineering teams to integrate ML workflows.
- Mentor junior engineers and promote engineering best practices.
#LI-Hybrid
#LI-OS1
##### Qualifications
- Bachelor’s degree in computer science, Software Engineering, or a related field.
- 4+ years of hands-on development experience with Python and AWS.
- Proven experience with infrastructure as code (preferably AWS CDK, Terraform, or CloudFormation).
- Strong knowledge of AWS architecture and services, particularly in data/ML workloads.
- Deep experience with CI/CD pipelines (Jenkins or similar).
- Strong expertise in Docker and containerized applications.
- Demonstrated knowledge of cloud security, scalability, and tenant isolation.
- Hands-on experience with observability platforms (preferably Datadog).
- Self-motivated and goal-oriented with a high work ethic.
##### Additional Information
- Background in MLOps, Data Platforms, or Machine Learning workflows.
- Experience with additional monitoring and logging tools (CloudWatch, Prometheus, ELK).
- Leadership experience in scaling cloud-native platforms.
- Experience in information security – an advantage
- Understanding of identity & access management, secrets management, or zero-trust architecture - Bonus.