Lead - Platform Engineering

Yodlee

| Trivandrum, Kerala, India (On Site) | Full Time | 3 months ago

Apply Now

Job Summary

As a Lead Platform Engineer, you will act as an SME for Linux systems, deploying, configuring, and maintaining Ubuntu-based environments globally. Responsibilities include developing automation scripts, implementing security compliance, troubleshooting performance, and providing expert support for on-prem, colocation, and AWS cloud infrastructure. You will design hybrid solutions, automate provisioning with tools like Ansible and Terraform, manage Hyper-V clusters, and support enterprise DNS. The role involves participating in high-severity incidents, liaising with vendors, and mentoring junior team members.

Must Have

Act as an SME for Linux-based systems.
Deploy, configure, and maintain Linux-based systems (Ubuntu) across global datacenter environments.
Develop and maintain scripts (Bash, Python) for system administration, monitoring, and operational tasks.
Implement and manage patching, hardening, and security compliance for Linux systems.
Troubleshoot performance issues and optimize Linux environments for scalability and reliability.
Provide expert-level support to deploy, maintain, troubleshoot, and upgrade Infrastructure in on-prem and colocation datacenter spaces, as well as cloud services such as AWS.
Design and implement hybrid solutions that integrate on-premises environments with AWS cloud services for scalability and resilience.
Automate provisioning and configuration of resources using tools such as Ansible, CloudFormation, or Terraform.
Provide support for Cisco UCS, and traditional Dell and HP servers.
Configure, monitor, maintain, and upgrade multiple large Hyper-V clusters.
Provide support for large-scale enterprise DNS.
Participate in high-severity incidents, take ownership, and lead charge in troubleshooting, resolution, and root cause analysis.
Familiarize with Jira workflow, ticketing procedures, and implementations.
Liaise with vendors/business units to build and document infrastructure environments.
Provide expert advice, critically examine infrastructure and processes, and introduce/follow best practices to meet or exceed high availability, reliability, security, and industry compliance.
Foster team collaboration, regularly and generously share knowledge, and participate in mentoring/upskilling junior team members.
Participate in on-call rotations.
Strong hands-on experience with Linux administration (Ubuntu preferred).
Expert-level knowledge of core systems concepts such as DNS, TCP/IP, DHCP, Operating system, Virtualization, SSL Certificates/PKI.
Hands-on experience with AWS services (EC2, VPC, IAM, S3, CloudWatch) and hybrid cloud integration.
Proficiency in automation tools such as Ansible for configuration management.
Scripting skills in Bash, Python, or similar languages for automation and operational efficiency.
Expert-level knowledge of and extensive experience in Virtualization technologies, such as Microsoft Hyper-V.
Experience in backup technologies, including, but not limited to Veeam, Commvault, and Cohesity.

Job Description

Description

Job Responsibilities

Act as an SME (Subject Matter Expert) for Linux-based systems
Deploy, configure, and maintain Linux-based systems (primarily Ubuntu) across global datacenter environments
Develop and maintain scripts (Bash, Python, or similar) for system administration, monitoring, and operational tasks
Implement and manage patching, hardening, and security compliance for Linux systems
Troubleshoot performance issues and optimize Linux environments for scalability and reliability
Provide expert-level support to deploy, maintain, troubleshoot, and upgrade Infrastructure in on-prem and colocation datacenter spaces, as well as cloud services such as AWS
Design and implement hybrid solutions that integrate on-premises environments with AWS cloud services for scalability and resilience
Automate provisioning and configuration of resources using tools such as Ansible, CloudFormation, or Terraform
Provide support for Cisco UCS, and traditional Dell and HP servers
Configure, monitor, maintain, and upgrade multiple large Hyper-V clusters
Provide support for large-scale enterprise DNS
Participate in high-severity incidents, take ownership, and lead charge in troubleshooting, resolution, and root cause analysis
Familiarize with Jira workflow, ticketing procedures, and implementations
Liaise with vendors/business units to build and document infrastructure environments
Provide expert advice, critically examine infrastructure and processes, and introduce/follow best practices to meet or exceed high availability, reliability, security, and industry compliance
Foster team collaboration, regularly and generously share knowledge, and participate in mentoring/upskilling junior team members
Participate in on-call rotations

Required Skills / Experience

Strong hands-on experience with Linux administration (Ubuntu preferred)
Familiarity with Linux networking, system performance tuning, and troubleshooting
Experience with package management, kernel updates, and system hardening
Expert-level knowledge of core systems concepts such as DNS, TCP/IP, DHCP, Operating system, Virtualization, SSL Certificates/PKI
Hands-on experience with AWS services (EC2, VPC, IAM, S3, CloudWatch) and hybrid cloud integration
Familiarity with Infrastructure-as-Code tools for AWS (CloudFormation, Terraform)
Proficiency in automation tools such as Ansible for configuration management
Scripting skills in Bash, Python, or similar languages for automation and operational efficiency
Experience working on large enterprise infrastructure footprint and multi-forest AD environment
Windows Technologies such as IIS, DFS, DHCP, Windows-based DNS, Certificate authority, File shares
Expert-level knowledge of and extensive experience in Virtualization technologies, such as Microsoft Hyper-V
Knowledge of and experience working with datacenter storage from various providers, including Dell, Pure, and HPE
Experience with storage concepts such as fibre-channel zoning, iSCSI, CIFS, NFS, and Block-configuration
Experience in backup technologies, including, but not limited to Veeam, Commvault, and Cohesity
Experience working in large geographically dispersed multiple Datacenter Infrastructure and Operations
Knowledge on orchestration, compute, storage, and networking concepts
Knowledge of Jira, Scrum, Sprint, and Kanban concepts