Director of Software Engineering, AI Deployment

11 Minutes ago • 10 Years +
Research Development

Job Description

As Director of Software Engineering, AI Deployment, you will lead teams in engineering scalable, resilient, and observable AI model services and supporting infrastructure. This role focuses on bringing cutting-edge AI models to production, overseeing backend APIs, automation tooling, and deployment pipelines, and defining performance and reliability standards. You will operate at the intersection of software engineering, DevOps, and AI, ensuring systems are production-ready and optimized for performance and cost-efficiency.
Good To Have:
  • Familiarity with LLM pipelines, streaming inference, or hybrid deployment environments (cloud + edge).
  • Prior ownership of large-scale AI delivery platforms or model hosting infrastructure.
  • Ability and willingness to learn any new technologies and apply them at work in order to stay ahead, in a fast paced, high pressure, agile environment.
  • Excellent problem-solving, analytical, and decision-making abilities.
  • Strong communication and stakeholder management skills.
Must Have:
  • Lead design and engineering of scalable, robust, and testable software systems that wrap and serve AI/ML models.
  • Drive development of reusable APIs, frameworks, and libraries to accelerate integration of AI into customer-facing products.
  • Oversee engineering of high-performance model inference systems, with a focus on both cloud-native and on-premise environments.
  • Architect backend services that are API-first, containerized, and designed for high availability.
  • Ensure all services are testable, observable, and meet handoff criteria for release candidate testing by the QA team for continuous integration, automated validation, and smooth production rollout.
  • Define and implement SLOs, SLIs, and error budgets for model-backed services.
  • Drive implementation of robust monitoring, alerting, logging, and auto-recovery mechanisms.
  • Build resilience and observability into AI systems by design and implement incident response protocols, runbooks, and reliability audits.
  • Lead efforts to optimize AI model serving performance: memory, compute, GPU usage, latency, and cost-efficiency.
  • Architect systems that can scale elastically based on demand, while maintaining deterministic behavior and uptime guarantees.
  • Oversee buildout of deployment automation tools, CI/CD for models and software components, and rollback systems.
  • Manage and grow a team of software and systems engineers responsible for end-to-end AI system readiness.
  • Set strategy for software delivery, technical quality, operational metrics, and performance benchmarks.
  • 10+ years in software engineering, with 4+ years in engineering leadership or director roles.
  • Demonstrated experience building and running production-grade AI/ML systems.
  • Deep expertise in backend development, API design, and cloud infrastructure (AWS, GCP, or Azure).
  • Solid grounding in SRE principles — including incident response, observability, error budgeting, and reliability metrics.
  • Strong knowledge of site reliability tooling (e.g., Prometheus, Grafana, OpenTelemetry, Sentry).
  • Familiarity with model serving frameworks (e.g., Triton, TorchServe, Ray Serve), and GPU compute orchestration.
  • Experience with CI/CD, Software Development Lifecycle for Software Systems, AI model lifecycle tooling, and infrastructure-as-code.
  • Bachelor's or Master's in Computer Science, Software Engineering, or equivalent.
Perks:
  • Global mission to revolutionize the way the world games
  • Opportunity to make an impact globally
  • Work across a global team located across 5 continents
  • Unique, gamer-centric #LifeAtRazer experience
  • Accelerated growth, both personally and professionally
  • Certified as a Great Place to Work® in both United States and Singapore

Add these skills to join the top 1% applicants for this job

team-management
communication
forecasting-budgeting
game-texts
quality-control
agile-development
incident-response
aws
azure
model-serving
prometheus
grafana
ci-cd

Job Responsibilities:

We are seeking a Director of Software Engineering, AI Deployment with a strong SRE orientation to lead the software and systems engineering required to bring cutting-edge AI models to production. This role is responsible for engineering AI model services, building supporting infrastructure, and ensuring that the systems are scalable, resilient, observable, and production-ready.

You’ll oversee teams building backend APIs, automation tooling, and deployment pipelines, while also defining performance, availability, and reliability standards. This role operates at the intersection of software engineering, DevOps, and AI.

You’ll oversee teams building backend APIs, automation tooling, and deployment pipelines, while also defining performance, availability, and reliability standards. This role operates at the intersection of software engineering, DevOps, and AI.

Essential Duties and Responsibilities

  • Lead the design and engineering of scalable, robust, and testable software systems that wrap and serve AI/ML models.
  • Drive development of reusable APIs, frameworks, and libraries to accelerate integration of AI into customer-facing products.
  • Oversee engineering of high-performance model inference systems, with a focus on both cloud-native and on-premise environments.
  • Architect backend services that are API-first, containerized, and designed for high availability
  • Ensure all services are testable, observable, and meet handoff criteria for release candidate testing by the QA team for continuous integration, automated validation, and smooth production rollout
  • Define and implement SLOs, SLIs, and error budgets for model-backed services.
  • Drive implementation of robust monitoring, alerting, logging, and auto-recovery mechanisms.
  • Build resilience and observability into AI systems by design and implement incident response protocols, runbooks, and reliability audits
  • Lead efforts to optimize AI model serving performance: memory, compute, GPU usage, latency, and cost-efficiency.
  • Architect systems that can scale elastically based on demand, while maintaining deterministic behavior and uptime guarantees.
  • Oversee buildout of deployment automation tools, CI/CD for models and software components, and rollback systems.
  • Manage and grow a team of software and systems engineers responsible for end-to-end AI system readiness.
  • Set strategy for software delivery, technical quality, operational metrics, and performance benchmarks.

Pre-Requisites:

Qualifications

  • 10+ years in software engineering, with 4+ years in engineering leadership or director roles.
  • Demonstrated experience building and running production-grade AI/ML systems.
  • Deep expertise in backend development, API design, and cloud infrastructure (AWS, GCP, or Azure).
  • Solid grounding in SRE principles — including incident response, observability, error budgeting, and reliability metrics.
  • Strong knowledge of site reliability tooling (e.g., Prometheus, Grafana, OpenTelemetry, Sentry)
  • Familiarity with model serving frameworks (e.g., Triton, TorchServe, Ray Serve), and GPU compute orchestration
  • Familiarity with LLM pipelines, streaming inference, or hybrid deployment environments (cloud + edge)
  • Prior ownership of large-scale AI delivery platforms or model hosting infrastructure.
  • Experience with CI/CD, Software Development Lifecycle for Software Systems, AI model lifecycle tooling, and infrastructure-as-code.
  • Excellent problem-solving, analytical, and decision-making abilities.
  • Strong communication and stakeholder management skills.
  • Ability and willingness to learn any new technologies and apply them at work in order to stay ahead, in a fast paced, high pressure, agile environment
  • Excellent written and verbal communication skills for coordinating across teams.

Education & Experience

Bachelor's or Master's in Computer Science, Software Engineering, or equivalent

Travel Requirements

  • Role based in Singapore office and may require up to 1 travel trip per year.

Set alerts for more jobs like Director of Software Engineering, AI Deployment
Set alerts for new jobs by Razer
Set alerts for new Research Development jobs in Singapore
Set alerts for new jobs in Singapore
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙