Cloud Engineer

2 Months ago • 5 Years + • Devops

Job Summary

Job Description

ShyftLabs is looking for a Senior Cloud Engineer to design, implement, and manage cloud infrastructure for GenAI applications, focusing on AWS. Responsibilities include managing cloud resources, databases, security, container orchestration, and storage. The role involves designing feature stores, vector stores, and data ingestion frameworks, as well as implementing serverless solutions and disaster recovery strategies. The engineer will build a self-service platform, automate deployments with Infrastructure as Code (Terraform), and establish CI/CD pipelines. Additionally, they will implement monitoring, alerting, and optimize cloud costs while championing DevOps best practices and ensuring security compliance.
Must have:
  • 5+ years of hands-on experience with AWS services
  • 2+ years of hands-on experience with Databricks
  • Expert-level knowledge of AWS core services
  • Expert-level knowledge of Databricks capabilities
  • Strong proficiency with Terraform
  • Experience with containerization (Docker, Kubernetes)
  • Solid understanding of networking concepts
  • Experience with CI/CD tools
  • Proficiency in scripting languages (Python, Bash)
  • Extensive experience with AWS Lambda, API Gateway, ECS, Step Functions
  • Experience with event-driven patterns using SNS, SQS, EventBridge
  • Proven experience designing and implementing DR strategies in AWS
  • Primary: AWS (extensive experience required)
  • Experience with monitoring tools (CloudWatch, Datadog)
  • Log management systems
Good to have:
  • Familiarity with SageMaker, Bedrock, or Anthropic/Claude API integration
  • Experience with Jenkins, GitLab CI, GitHub Actions, AWS CodePipeline
  • Experience with serverless frameworks (SAM, Serverless Framework)
  • Knowledge of serverless best practices
  • Expertise in multi-region architectures and data replication
  • Experience with AWS backup services
  • Knowledge of RTO/RPO planning
  • Hands-on experience with Route53 health checks
  • Secondary: Azure and Google Cloud Platform (working knowledge)
  • Multi-cloud architecture understanding
  • Experience with Prometheus, Grafana
  • ELK stack, Splunk, CloudWatch Logs
  • APM tools and distributed tracing
  • AWS certifications
  • Databricks Certifications
  • Experience with open-source LLMs
  • Experience with chaos engineering
  • Knowledge of security frameworks and compliance
  • Experience implementing complex build systems
  • Background in building developer platforms
  • Experience with IaC testing frameworks
Perks:
  • Competitive salary
  • Strong healthcare insurance and benefits package
  • Extensive learning and development resources

Job Details

Position Overview:
ShyftLabs is seeking a highly skilled Cloud Engineer (Senior, Data Platforms) to join our team and lead the design, implementation, and management of cloud infrastructure for our innovative GenAI applications. This role will be instrumental in building a robust platform that enables rapid experimentation and deployment while maintaining enterprise-grade security and reliability.

ShyftLabs is a growing data product company founded in early 2020 and works primarily with Fortune 500 companies. We deliver digital solutions built to help accelerate the growth of businesses in various industries, by focusing on creating value through innovation.

Job Responsibilities:


    • Cloud Infrastructure Management
    • Design, provision, and maintain cloud resources across AWS (primary), with capabilities to work in Azure and Google Cloud environments
    • Manage end-to-end infrastructure for full-stack GenAI applications including:
    • Database systems (Aurora, RDS, DynamoDB, DocumentDB, etc.)
    • Security groups and IAM policies
    • VPC architecture and network design
    • Container orchestration (ECS, EKS, Lambda)
    • Storage solutions (S3, EFS, etc.)
    • CDN configuration (CloudFront)
    • DNS management (Route53)
    • Load balancing and auto-scaling
    • Data & AI Platforms
    • Design feature stores, vector stores, data ingestion frameworks, and lakehouse architectures
    • Manage data governance, lineage, masking, and access controls around data products
    • Serverless Architecture
    • Design and implement serverless solutions using AWS Lambda, API Gateway, and EventBridge
    • Optimize serverless applications for performance, cost, and scalability
    • Implement event-driven architectures and asynchronous processing patterns
    • Manage serverless deployment pipelines and monitoring
    • Disaster Recovery & High Availability
    • Architect and implement comprehensive disaster recovery strategies
    • Design multi-region failover capabilities with automated recovery procedures
    • Implement RTO/RPO requirements through backup strategies and replication
    • Build auto-failover mechanisms using Route53 health checks and failover routing
    • Create and maintain disaster recovery runbooks and testing procedures
    • Ensure data durability through cross-region replication and backup strategies
    • Platform Development
    • Build and maintain a self-service platform enabling rapid experimentation and testing of GenAI applications
    • Implement Infrastructure as Code (IaC) using Terraform for consistent and repeatable deployments
    • Create streamlined CI/CD pipelines that support local-to-dev-to-prod workflows
    • Design systems that minimize deployment time and maximize developer productivity
    • Establish quick feedback loops between development and deployment
    • Monitoring & Operations
    • Implement comprehensive monitoring, observability, and alerting solutions
    • Set up logging aggregation and analysis tools
    • Ensure high availability and disaster recovery capabilities Optimize cloud costs while maintaining performance
    • DevOps Excellence
    • Champion DevOps best practices across the organization
    • Automate infrastructure provisioning and application deployment
    • Implement security best practices and compliance requirements
    • Create documentation and runbooks for operational procedures

Basic Qualifications:


    • Technical Skills
    • 5+ years of hands-on experience with AWS services
    • 2+ years of hands-on experience with Databricks
    • Expert-level knowledge of AWS core services (EC2, VPC, IAM, S3, RDS, Lambda, ECS/EKS)
    • Expert-level knowledge of Databricks capabilities
    • Familiarity with SageMaker, Bedrock, or Anthropic/Claude API integration
    • Strong proficiency with Terraform for infrastructure automation
    • Demonstrated experience with containerization (Docker, Kubernetes)
    • Solid understanding of networking concepts (subnets, routing, security groups, VPN)
    • Experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, AWS CodePipeline) 
    • Proficiency in scripting languages (Python, Bash, PowerShell)
    • Serverless & Event-Driven Architecture
    • Extensive experience with AWS Lambda, API Gateway, ECS, Step Functions
    • Knowledge of serverless frameworks (SAM, Serverless Framework)
    • Experience with event-driven patterns using SNS, SQS, EventBridge
    • Understanding of serverless best practices and optimization techniques
    • Disaster Recovery & Business Continuity
    • Proven experience designing and implementing DR strategies in AWS
    • Expertise in multi-region architectures and data replication
    • Experience with AWS backup services and cross-region failover
    • Knowledge of RTO/RPO planning and implementation
    • Hands-on experience with Route53 health checks and failover routing policies
    • Cloud Platform Experience
    • Primary: AWS (extensive experience required)
    • Secondary: Azure and Google Cloud Platform (working knowledge)
    • Multi-cloud architecture understanding
    • Monitoring & Observability
    • Experience with monitoring tools (CloudWatch, Datadog, Prometheus, Grafana)
    • Log management systems (ELK stack, Splunk, CloudWatch Logs) APM tools and distributed tracing

Preferred Qualifications

    • AWS certifications (Solutions Architect, DevOps Engineer)
    • Databricks Certifications
    • Experience with open-source LLMs, embedding models, and RAG-based applications
    • Experience with chaos engineering and resilience testing
    • Knowledge of security frameworks and compliance (SOC2, HIPAA, PCI)
    • Experience implementing complex build systems for mono-repo micro-services architectures
    • Background in building developer platforms or internal tools Experience with Infrastructure as Code testing frameworks
We are proud to offer a competitive salary alongside a strong healthcare insurance and benefits package. The role is preferably hybrid, with 2 days per week spent in the office, and flexibility for client engagement needs. We pride ourselves on the growth of our employees, offering extensive learning and development resources. 

ShyftLabs is an equal-opportunity employer committed to creating a safe, diverse and inclusive environment. We encourage qualified applicants of all backgrounds including ethnicity, religion, disability status, gender identity, sexual orientation, family status, age, nationality, and education levels to apply. If you are contacted for an interview and require accommodation during the interviewing process, please let us know.

Similar Jobs

CRB workforce  - Senior Software Engineer

CRB workforce

Salt Lake City, Utah, United States (On-Site)
2 Months ago
Arrise Solutions (India)   - Senior Data Scientist (Remote)

Arrise Solutions (India)

Hyderabad, Telangana, India (Remote)
10 Months ago
Games talent (Staffing and recruiting) - Gameplay & Systems Programmer

Games talent (Staffing and recruiting)

Kyoto, Kyoto, Japan (On-Site)
3 Months ago
Nice - Senior Technical Writer

Nice

Pune, Maharashtra, India (Hybrid)
2 Months ago
ARVORE Immersive Experiences - Creative Director

ARVORE Immersive Experiences

São Paulo, State Of São Paulo, Brazil (Remote)
4 Months ago
Nice - Specialist Automation Engineer, Actimize

Nice

Pune, Maharashtra, India (Hybrid)
1 Month ago
Safe security - Software Development Engineer II - Platform

Safe security

Bengaluru, Karnataka, India (On-Site)
6 Months ago
Xsolla - Site Reliability Engineer

Xsolla

Raleigh, North Carolina, United States (Hybrid)
2 Months ago
Nagarro - Associate Staff Engineer, Mobile Cross Platform

Nagarro

India (Remote)
10 Months ago
Anavation - DevOps Engineer

Anavation

Lorton, Virginia, United States (Hybrid)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ISS Stoxx - Senior Web Developer (WordPress)

ISS Stoxx

Makati City, Metro Manila, Philippines (Hybrid)
2 Months ago
binance - Smart Contract Security Engineer

binance

Dubai, Dubai, United Arab Emirates (Remote)
1 Year ago
C3 IoT - Solution Engineer

C3 IoT

Amsterdam, North Holland, Netherlands (On-Site)
1 Month ago
Hawkeye Innovations - Test Automation Engineer (Frontend)

Hawkeye Innovations

Budapest, Hungary (Hybrid)
1 Month ago
N-ix - Middle Data Engineer

N-ix

Poland (Hybrid)
1 Month ago
Autodesk - Software Development Engineer

Autodesk

Singapore (On-Site)
4 Weeks ago
Rockstar Games - Senior DevOps Engineer

Rockstar Games

Edinburgh, Scotland, United Kingdom (On-Site)
11 Months ago
luxsoft - Murex DataMart Developer

luxsoft

India (Remote)
2 Months ago
Atomic cartoons  - CG Core Pipeline Developer

Atomic cartoons

Vancouver, British Columbia, Canada (Remote)
2 Months ago
NCR Atleos - Full Stack Java Developer, G11

NCR Atleos

Hyderabad, Telangana, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Toronto, Ontario, Canada

Next Level Games - Intermediate Producer

Next Level Games

Vancouver, British Columbia, Canada (Hybrid)
1 Month ago
Autodesk - Senior Financial Analyst, Corporate FP&A

Autodesk

Toronto, Ontario, Canada (On-Site)
1 Year ago
Track VFX - Matchmove Supervisor

Track VFX

Vancouver, British Columbia, Canada (On-Site)
5 Months ago
Kano studios - Game Developer (Unity)

Kano studios

British Columbia, Canada (Remote)
1 Month ago
MiQ - Lead Solutions Consultant

MiQ

Toronto, Ontario, Canada (Hybrid)
1 Month ago
Unity - Senior Technical Program Manager

Unity

Montreal, Quebec, Canada (On-Site)
1 Month ago
Ansys - Senior R&D Engineer

Ansys

Waterloo, Ontario, Canada (Remote)
1 Month ago
Dialpad AI - Lead, Advanced Support Partner Services

Dialpad AI

Kitchener, Ontario, Canada (On-Site)
1 Month ago
Jam City - Game Designer

Jam City

Canada (Remote)
2 Months ago
Turbulent - Tools Developer

Turbulent

Montreal, Quebec, Canada (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

bytedance - Site Reliability Engineer, Edge Services (Seattle)

bytedance

Seattle, Washington, United States (On-Site)
1 Month ago
Expedia - Technical Solutions Engineer

Expedia

Chicago, Illinois, United States (On-Site)
1 Month ago
Power Integrations - Systems & Infrastructure Applications Engineer

Power Integrations

Pasig, Metro Manila, Philippines (On-Site)
11 Months ago
Zuora - Sr Enterprise Solution Architect - Zuora Revenue

Zuora

United States (Remote)
3 Months ago
Whatnot - Software Engineer, Search and Discovery Platform

Whatnot

San Francisco, California, United States (On-Site)
3 Months ago
London stock Exchange - Site Reliability Engineer

London stock Exchange

Bengaluru, Karnataka, India (On-Site)
2 Months ago
bytedance - Software Engineer (Distributed Storage), Cloud Infrastructure

bytedance

Singapore (On-Site)
9 Months ago
Ansys - Lead Application Engineer - DevOps Engineering Lead

Ansys

Montigny-le-Bretonneux, Île-de-France, France (On-Site)
2 Months ago
Google - Senior Software Engineer, Infrastructure, Core

Google

Kirkland, Washington, United States (On-Site)
4 Months ago
Sailpoint - Staff Site Reliability Engineer (Staff SRE)

Sailpoint

United States (Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Here at ShyftLabs, we build data products to help enterprises deliver real impact through tailored data analytics, science, and AI solutions. From consulting to operations, we guide our customers through their data journey and ensure they are data and AI-empowered.

Noida, Uttar Pradesh, India (Hybrid)

Noida, Uttar Pradesh, India (On-Site)

Noida, Uttar Pradesh, India (Hybrid)

Atlanta, Georgia, United States (Hybrid)

Noida, Uttar Pradesh, India (Hybrid)

Noida, Uttar Pradesh, India (Hybrid)

Noida, Uttar Pradesh, India (Hybrid)

Noida, Uttar Pradesh, India (On-Site)

Noida, Uttar Pradesh, India (On-Site)

Noida, Uttar Pradesh, India (Remote)

View All Jobs

Get notified when new jobs are added by ShyftLabs