Lead - Platform Engineering
Yodlee
Job Summary
As a Lead Platform Engineer, you will act as an SME for Linux systems, deploying, configuring, and maintaining Ubuntu-based environments globally. Responsibilities include developing automation scripts, implementing security compliance, troubleshooting performance, and providing expert support for on-prem, colocation, and AWS cloud infrastructure. You will design hybrid solutions, automate provisioning with tools like Ansible and Terraform, manage Hyper-V clusters, and support enterprise DNS. The role involves participating in high-severity incidents, liaising with vendors, and mentoring junior team members.
Must Have
- Act as an SME for Linux-based systems.
- Deploy, configure, and maintain Linux-based systems (Ubuntu) across global datacenter environments.
- Develop and maintain scripts (Bash, Python) for system administration, monitoring, and operational tasks.
- Implement and manage patching, hardening, and security compliance for Linux systems.
- Troubleshoot performance issues and optimize Linux environments for scalability and reliability.
- Provide expert-level support to deploy, maintain, troubleshoot, and upgrade Infrastructure in on-prem and colocation datacenter spaces, as well as cloud services such as AWS.
- Design and implement hybrid solutions that integrate on-premises environments with AWS cloud services for scalability and resilience.
- Automate provisioning and configuration of resources using tools such as Ansible, CloudFormation, or Terraform.
- Provide support for Cisco UCS, and traditional Dell and HP servers.
- Configure, monitor, maintain, and upgrade multiple large Hyper-V clusters.
- Provide support for large-scale enterprise DNS.
- Participate in high-severity incidents, take ownership, and lead charge in troubleshooting, resolution, and root cause analysis.
- Familiarize with Jira workflow, ticketing procedures, and implementations.
- Liaise with vendors/business units to build and document infrastructure environments.
- Provide expert advice, critically examine infrastructure and processes, and introduce/follow best practices to meet or exceed high availability, reliability, security, and industry compliance.
- Foster team collaboration, regularly and generously share knowledge, and participate in mentoring/upskilling junior team members.
- Participate in on-call rotations.
- Strong hands-on experience with Linux administration (Ubuntu preferred).
- Expert-level knowledge of core systems concepts such as DNS, TCP/IP, DHCP, Operating system, Virtualization, SSL Certificates/PKI.
- Hands-on experience with AWS services (EC2, VPC, IAM, S3, CloudWatch) and hybrid cloud integration.
- Proficiency in automation tools such as Ansible for configuration management.
- Scripting skills in Bash, Python, or similar languages for automation and operational efficiency.
- Expert-level knowledge of and extensive experience in Virtualization technologies, such as Microsoft Hyper-V.
- Experience in backup technologies, including, but not limited to Veeam, Commvault, and Cohesity.
Job Description
Description
Job Responsibilities
- Act as an SME (Subject Matter Expert) for Linux-based systems
- Deploy, configure, and maintain Linux-based systems (primarily Ubuntu) across global datacenter environments
- Develop and maintain scripts (Bash, Python, or similar) for system administration, monitoring, and operational tasks
- Implement and manage patching, hardening, and security compliance for Linux systems
- Troubleshoot performance issues and optimize Linux environments for scalability and reliability
- Provide expert-level support to deploy, maintain, troubleshoot, and upgrade Infrastructure in on-prem and colocation datacenter spaces, as well as cloud services such as AWS
- Design and implement hybrid solutions that integrate on-premises environments with AWS cloud services for scalability and resilience
- Automate provisioning and configuration of resources using tools such as Ansible, CloudFormation, or Terraform
- Provide support for Cisco UCS, and traditional Dell and HP servers
- Configure, monitor, maintain, and upgrade multiple large Hyper-V clusters
- Provide support for large-scale enterprise DNS
- Participate in high-severity incidents, take ownership, and lead charge in troubleshooting, resolution, and root cause analysis
- Familiarize with Jira workflow, ticketing procedures, and implementations
- Liaise with vendors/business units to build and document infrastructure environments
- Provide expert advice, critically examine infrastructure and processes, and introduce/follow best practices to meet or exceed high availability, reliability, security, and industry compliance
- Foster team collaboration, regularly and generously share knowledge, and participate in mentoring/upskilling junior team members
- Participate in on-call rotations
Required Skills / Experience
- Strong hands-on experience with Linux administration (Ubuntu preferred)
- Familiarity with Linux networking, system performance tuning, and troubleshooting
- Experience with package management, kernel updates, and system hardening
- Expert-level knowledge of core systems concepts such as DNS, TCP/IP, DHCP, Operating system, Virtualization, SSL Certificates/PKI
- Hands-on experience with AWS services (EC2, VPC, IAM, S3, CloudWatch) and hybrid cloud integration
- Familiarity with Infrastructure-as-Code tools for AWS (CloudFormation, Terraform)
- Proficiency in automation tools such as Ansible for configuration management
- Scripting skills in Bash, Python, or similar languages for automation and operational efficiency
- Experience working on large enterprise infrastructure footprint and multi-forest AD environment
- Windows Technologies such as IIS, DFS, DHCP, Windows-based DNS, Certificate authority, File shares
- Expert-level knowledge of and extensive experience in Virtualization technologies, such as Microsoft Hyper-V
- Knowledge of and experience working with datacenter storage from various providers, including Dell, Pure, and HPE
- Experience with storage concepts such as fibre-channel zoning, iSCSI, CIFS, NFS, and Block-configuration
- Experience in backup technologies, including, but not limited to Veeam, Commvault, and Cohesity
- Experience working in large geographically dispersed multiple Datacenter Infrastructure and Operations
- Knowledge on orchestration, compute, storage, and networking concepts
- Knowledge of Jira, Scrum, Sprint, and Kanban concepts
16 Skills Required For This Role
Team Management
Problem Solving
Game Texts
Kanban
Networking
Iis
Dns
Linux
Aws
Dhcp
Hyper V
Ansible
Terraform
Python
Jira
Bash