Cloud Infrastructure Engineer (AWS / Kubernetes / SRE)

1 Month ago • 5 Years + • Devops

Job Summary

Job Description

We’re looking for a Cloud Infrastructure Engineer who thrives at the crossroads of classic Linux and AWS infrastructure and modern Site Reliability Engineering. This is a high-impact, hybrid role designed for someone who can manage cloud resources, harden Kubernetes clusters, and shape a more reliable and developer-friendly platform. We need you not just to maintain but to rethink and evolve our infrastructure, balancing hands-on operations with strategic improvements that future-proof our growing AI product landscape. You’ll take over key responsibilities from our current Infra Lead who is transitioning to a software-focused role, giving you immediate ownership and space to shine.
Must have:
  • Maintain and harden AWS infrastructure (EC2, ALB/NLB, WAF, IAM, CloudWatch)
  • Operate and evolve our EKS clusters powering Python-based AI services
  • Migrate existing services to Kubernetes using Terraform and Helm
  • Codify infrastructure with Terraform and manage host-level automation via Ansible
  • Build and improve CI/CD pipelines with GitHub Actions
  • Own observability efforts: Prometheus, Grafana, alerting, and on-call readiness
  • Support OS-level patching, certs, WAF rules, and general infra hygiene
  • Partner with engineers to guide best practices and drive platform reliability
  • Create clean, maintainable infrastructure documentation and playbooks
  • Occasionally support rare off-hours incidents
Good to have:
  • Strong Ansible skills beyond the basics
  • PostgreSQL or Amazon RDS tuning and operations experience
  • Deep understanding of observability tools (Prometheus, Grafana, Loki, etc.)
  • Familiarity with PHP production environments
  • Experience with TDD, CI/CD best practices, and agile development
  • Any previous SRE-like exposure such as building resilience, automation, or incident tooling
Perks:
  • Comprehensive health insurance for both you and your family.
  • Professional development budget for conference tickets, online courses, and other relevant resources to help you grow.
  • Flexible benefits package to tailor perks that matters most for you.
  • Hybrid work and generous leave options to prioritize your work-life balance.
  • In-office perks, including free meals and snacks.
  • Company-funded sport activities, annual offsites, and team-building events.

Job Details

WHO WE ARE 🌍

Manychat is a leading Chat Marketing platform. We help businesses engage with their customers on Instagram, Facebook Messenger, WhatsApp, and Telegram.

Trusted by over 1 million brands in 170+ countries, we're an official Meta Business Partner, backed by top investors, including Bessemer Venture Partners.

With 200+ teammates across international offices in Barcelona, Austin, Amsterdam, São Paulo, and Yerevan — Manychat helps businesses across the globe improve their ROI and grow faster.

ABOUT THE ROLE 🚀

We’re looking for a Cloud Infrastructure Engineer who thrives at the crossroads of classic Linux and AWS infrastructure and modern Site Reliability Engineering. This is a high-impact, hybrid role designed for someone who can manage cloud resources, harden Kubernetes clusters, and shape a more reliable and developer-friendly platform.

We need you not just to maintain but to rethink and evolve our infrastructure, balancing hands-on operations with strategic improvements that future-proof our growing AI product landscape.

You’ll take over key responsibilities from our current Infra Lead who is transitioning to a software-focused role, giving you immediate ownership and space to shine.

WHY THE ROLE IS SPECIAL 💡

You won’t be a cog in a massive SRE org. You’ll be the bridge between Infrastructure and Engineering, shaping how we scale Kubernetes, how we approach platform reliability, and how developers ship fast without fear. You’ll get autonomy, ownership, and a smart, humble team excited to learn with you.

WHAT YOU’LL DO 🤖

  • Maintain and harden AWS infrastructure (EC2, ALB/NLB, WAF, IAM, CloudWatch)
  • Operate and evolve our EKS clusters powering Python-based AI services
  • Migrate existing services to Kubernetes using Terraform and Helm
  • Codify infrastructure with Terraform and manage host-level automation via Ansible
  • Build and improve CI/CD pipelines with GitHub Actions
  • Own observability efforts: Prometheus, Grafana, alerting, and on-call readiness
  • Support OS-level patching, certs, WAF rules, and general infra hygiene
  • Partner with engineers to guide best practices and drive platform reliability
  • Create clean, maintainable infrastructure documentation and playbooks
  • Occasionally support rare off-hours incidents (don’t worry, really rare)

WHAT YOU’LL BRING 💥

  • 5+ years of experience managing Linux in production (Ubuntu, Amazon Linux)
  • Strong experience with Kubernetes (ideally EKS), Helm, and Terraform
  • Comfort with running and debugging Python workloads in containers
  • Solid understanding of networking, IAM, and cloud security best practices
  • Hands-on Nginx experience (Ingress and reverse proxy setups)
  • Excellent communication skills; you can explain complex infra to devs clearly

NICE TO HAVE SKILLS 🛠️

  • Strong Ansible skills beyond the basics
  • PostgreSQL or Amazon RDS tuning and operations experience
  • Deep understanding of observability tools (Prometheus, Grafana, Loki, etc.)
  • Familiarity with PHP production environments
  • Experience with TDD, CI/CD best practices, and agile development
  • Any previous SRE-like exposure such as building resilience, automation, or incident tooling

WHAT WE OFFER 🤗

We care deeply about your growth, well-being, and comfort:

  • 💙 Comprehensive health insurance for both you and your family.
  • 📚 Professional development budget for conference tickets, online courses, and other relevant resources to help you grow.
  • 🫶 Flexible benefits package to tailor perks that matters most for you.
  • 🪴 Hybrid work and generous leave options to prioritize your work-life balance.
  • 🍽️ In-office perks, including free meals and snacks.
  • 🤝 Company-funded sport activities, annual offsites, and team-building events.

Similar Jobs

Aptive - Systems Engineer

Aptive

Monterrey, Nuevo Leon, Mexico (On-Site)
1 Month ago
Canonical - Cloud Field Engineer

Canonical

(Remote)
3 Months ago
Reliance games - Game Programmer

Reliance games

Pune, Maharashtra, India (On-Site)
3 Months ago
Arkane studios - Technical Character Artist Outsource Manager

Arkane studios

Lyon, Auvergne-Rhône-Alpes, France (On-Site)
3 Months ago
Match Group - Senior Data Scientist, Monetization

Match Group

New York, New York, United States (Hybrid)
4 Months ago
Ion - Cloud Engineer Kubernetes

Ion

Italy (Hybrid)
10 Months ago
sitetracker - Salesforce Solution Architect

sitetracker

Singapore (On-Site)
2 Months ago
Tencent - Cloud Engineer

Tencent

(On-Site)
9 Months ago
Epic Games - Senior Mobile Platform Engineer

Epic Games

Cary, North Carolina, United States (On-Site)
7 Months ago
Thales - DevOps Engineer

Thales

Ditzingen, Baden-Württemberg, Germany (Hybrid)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Playtika - Expert Business Analyst

Playtika

Israel (On-Site)
5 Months ago
Valeo - Intellectual Property Engineer

Valeo

Skawina, Lesser Poland Voivodeship, Poland (Hybrid)
1 Month ago
EveryMatrix - Experienced CRM Data Scientist

EveryMatrix

United Kingdom (Hybrid)
10 Months ago
Sonar Source - Product Operations Lead

Sonar Source

Geneva, Geneva, Switzerland (On-Site)
6 Months ago
Zazz - Data Engineer

Zazz

(Remote)
7 Months ago
Rocket Science - Frontend UI Engineer

Rocket Science

Brighton And Hove, England, United Kingdom (Hybrid)
4 Months ago
Epic Games - Senior Data Analyst, Game Platform

Epic Games

Cary, North Carolina, United States (On-Site)
7 Months ago
Meesho - Product Manager II .

Meesho

Bengaluru, Karnataka, India (On-Site)
10 Months ago
Penrose studios - Chief of Staff

Penrose studios

San Francisco, California, United States (On-Site)
5 Years ago
Postman - Growth Account Executive

Postman

San Francisco, California, United States (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Barcelona, Spain

Evolution  - Game Presenter (Online Croupier) - Dutch Speaking

Evolution

Madrid, Community Of Madrid, Spain (On-Site)
11 Months ago
Devoteam - IAM Senior Engineer OKTA

Devoteam

Bilbao, Basque Country, Spain (Remote)
7 Months ago
Triple dot studios - Principal UI Animator

Triple dot studios

Barcelona, Catalonia, Spain (Hybrid)
1 Month ago
Pinterest - Senior Strategic Partner Manager, Content

Pinterest

Madrid, Community Of Madrid, Spain (Hybrid)
1 Month ago
Devoteam - IAM Senior Engineer OKTA

Devoteam

Barcelona, Catalonia, Spain (Hybrid)
1 Month ago
Skydance - Rigging Artist

Skydance

Madrid, Community Of Madrid, Spain (Hybrid)
5 Months ago
Valeo - R&D Engineer

Valeo

Martos, Andalusia, Spain (On-Site)
3 Months ago
Aristocrat - Game Design Director

Aristocrat

Barcelona, Catalonia, Spain (Hybrid)
5 Months ago
Evolution  - Card Shuffler - with Disability

Evolution

Madrid, Community Of Madrid, Spain (On-Site)
5 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Google - Software Engineer III, Infrastructure, Google Cloud Platforms

Google

Cambridge, Massachusetts, United States (On-Site)
4 Months ago
Nagarro - Senior Staff Engineer - SAP FICO S/4Hana Solution Advisor

Nagarro

United States (Remote)
10 Months ago
Wolters Kluwer - Sr. Telephony Engineer (Genesys Cloud, MS Teams, Audiocodes, SIP Trunk Routing, Scripting)

Wolters Kluwer

Pune, Maharashtra, India (Hybrid)
1 Month ago
WebMD - Site Reliability Engineer

WebMD

Boise, Idaho, United States (On-Site)
3 Months ago
bytedance - Senior Software Engineer - Development Infrastructure Team

bytedance

Mountain View, California, United States (On-Site)
9 Months ago
Google - Senior Software Engineer, Embedded Systems/Firmware, Platforms Infrastructure Engineering

Google

Sunnyvale, California, United States (On-Site)
9 Months ago
deel. - Back-End Engineer - Infrastructure Team

deel.

Brazil (Remote)
1 Month ago
zeta - Senior Site Reliability Engineer

zeta

Hyderabad, Telangana, India (On-Site)
10 Months ago
Arkose Labs - Site Reliability Engineer

Arkose Labs

San José Province, Costa Rica (Remote)
1 Month ago
endava - Lead Cloud Engineer - GCP

endava

Sydney, New South Wales, Australia (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

Amsterdam, North Holland, Netherlands (Hybrid)

Amsterdam, North Holland, Netherlands (On-Site)

Barcelona, Catalonia, Spain (On-Site)

Barcelona, Catalonia, Spain (Hybrid)

Barcelona, Catalonia, Spain (Hybrid)

Amsterdam, North Holland, Netherlands (Hybrid)

Austin, Texas, United States (Hybrid)

Austin, Texas, United States (Hybrid)

Austin, Texas, United States (Hybrid)

Amsterdam, North Holland, Netherlands (Hybrid)

View All Jobs

Get notified when new jobs are added by Many Chat Inc.

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug