Infrastructure Engineer — Systems & Platform

11 Minutes ago • 1 Years + • $140,000 PA - $200,000 PA
System Design

Job Description

Sixtyfour builds AI research agents that can discover, link, and reason over information about people and companies, automating research workflows for sales, recruiting, and marketing. This role involves designing and maintaining highly available, scalable infrastructure on AWS, architecting CI/CD pipelines, optimizing LLM inference, improving deployment workflows, and enhancing system reliability and developer productivity.
Good To Have:
  • Experience managing LLM serving infrastructure (OpenAI-compatible APIs, vLLM, Triton, or similar).
  • Familiarity with Next.js and TypeScript to understand end-to-end deployment pipelines.
  • Experience with Terraform, Pulumi, or similar IaC tools.
  • Security-focused mindset, including network boundaries, secret management, and RBAC.
  • Knowledge of real-time systems (SSE or WebSockets) or stream processing.
  • Experience building developer platform tools or internal DevOps systems.
Must Have:
  • Strong experience with cloud infrastructure (AWS preferred) including EC2, ECS, EKS, Lambda, S3, VPCs, networking, and IAM.
  • Proficiency with Docker and CI/CD tools such as GitHub Actions or CircleCI.
  • Experience scaling Python backend systems and modern web APIs (FastAPI preferred).
  • Hands-on experience with API servers and background workers (Celery, Redis queues, etc.).
  • Comfort with Postgres and Redis, including schema design, caching, rate limiting, and locks.
  • Strong observability mindset, including logs, metrics, and traces.
  • Production experience with autoscaling, load testing, and cost-aware resource optimization.
  • Excellent debugging and on-call discipline with a focus on uptime and reliability.

Add these skills to join the top 1% applicants for this job

problem-solving
github
data-structures
game-texts
load-testing
playwright
networking
aws
load-balancing
prometheus
grafana
terraform
elasticsearch
circleci
amazon-web-services
fastapi
redis
ci-cd
docker
websockets
kubernetes
python
next.js
github-actions
typescript
stripe

About Sixtyfour

We build AI research agents that can discover, link, and reason over everything about people and companies. The platform turns that intelligence into automated research workflows for sales, recruiting, and marketing.

About the role

Skills: Kubernetes, Amazon Web Services (AWS)

What you’ll do

  • Design and maintain highly available, scalable infrastructure across AWS (ECS, EKS, Lambda, SQS, CloudFront, CloudWatch).
  • Architect automated CI/CD pipelines (GitHub Actions, Terraform) with strong testing, observability, and rollback safety.
  • Optimize LLM inference infrastructure, including autoscaling GPU/CPU clusters, caching, async queues, batching, and tracing.
  • Improve deployment workflows and environment consistency using Docker, IaC, and lightweight configuration management.
  • Work on backend performance, including queue throughput, caching strategies, database indexing, and load balancing.
  • Monitor, debug, and improve system reliability and latency across all services (API, inference, and web app).
  • Build internal tools that enhance developer productivity and operational visibility.
  • Partner with engineers to evolve the workflow and job execution engine for better parallelism, retry logic, and observability.
  • Set up metrics, tracing, and alerting (OpenTelemetry, Prometheus, Grafana, Sentry) to make reliability measurable and actionable.

Minimum requirements

  • Strong experience with cloud infrastructure (AWS preferred) including EC2, ECS, EKS, Lambda, S3, VPCs, networking, and IAM.
  • Proficiency with Docker and CI/CD tools such as GitHub Actions or CircleCI.
  • Experience scaling Python backend systems and modern web APIs (FastAPI preferred).
  • Hands-on experience with API servers and background workers (Celery, Redis queues, etc.).
  • Comfort with Postgres and Redis, including schema design, caching, rate limiting, and locks.
  • Strong observability mindset, including logs, metrics, and traces.
  • Production experience with autoscaling, load testing, and cost-aware resource optimization.
  • Excellent debugging and on-call discipline with a focus on uptime and reliability.

Nice to have

  • Experience managing LLM serving infrastructure (OpenAI-compatible APIs, vLLM, Triton, or similar).
  • Familiarity with Next.js and TypeScript to understand end-to-end deployment pipelines.
  • Experience with Terraform, Pulumi, or similar IaC tools.
  • Security-focused mindset, including network boundaries, secret management, and RBAC.
  • Knowledge of real-time systems (SSE or WebSockets) or stream processing.
  • Experience building developer platform tools or internal DevOps systems.

Technology

Language Models, Opensearch/Elasticsearch, Next.js (typescript), Python, FastAPI, AWS, Docker, Celery workers, Playwright, Supabase, Stripe

Set alerts for more jobs like Infrastructure Engineer — Systems & Platform
Set alerts for new jobs by Sixtyfour
Set alerts for new System Design jobs in United States
Set alerts for new jobs in United States
Set alerts for System Design (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙