Description
Role Mission:
To architect, build, and own a secure, scalable, and cost-effective platform on Google Cloud Platform. This is a mission-critical leadership role responsible for transforming our client’s current infrastructure from a complex, reactive state into a stable, proactive, and automated environment. You will be the ultimate technical owner for all aspects of the platform's reliability, security, and performance.
Key Responsibilities:
- Platform Architecture & Security: Design, implement, and own a standardized, secure, and automated cloud infrastructure on GCP. Lead the security hardening of our platform, including migrating services to private GKE clusters, implementing network security policies (VPCs, firewalls), and establishing the technical controls required for SOC 2 compliance.
- Infrastructure as Code & CI/CD: Establish and enforce best practices for Infrastructure as Code (Terraform is strongly preferred). Take ownership of and standardize our CI/CD pipelines to ensure reliable, repeatable deployments.
- Cost & Performance Optimization: Drive a comprehensive cost and performance optimization initiative across GCP. This includes a specific focus on diagnosing and remediating inefficiencies in our MongoDB Atlas implementation, network architecture, and resource utilization.
- Observability & Reliability: Implement a robust observability stack (logging, monitoring, alerting) and establish a clear incident response protocol to ensure platform reliability (SRE principles).
- Technical Leadership: Serve as the subject matter expert and single point of accountability for the entire platform infrastructure. Mentor other team members on DevOps best practices.
Requirements
- 7+ years of hands-on experience in a DevOps, SRE, or Platform Engineering role, with a strong emphasis on Google Cloud Platform (GCP).
- Expert-level knowledge of GCP services, including GKE, Cloud Run, VPC, Cloud Storage, and IAM.
- Proven experience diagnosing, optimizing, and managing MongoDB instances. Experience with Atlas performance tuning and cost management is a major plus.
- Deep proficiency with Infrastructure as Code (Terraform is required).
- Strong experience building and maintaining CI/CD pipelines (e.g., Jenkins, ArgoCD, Cloud Build).
- A proactive, ownership-driven mindset with a demonstrated ability to operate independently and turn chaos into order.
- Excellent problem-solving skills and the ability to architect solutions from the ground up.