As a Lead Platform Engineer, you will act as an SME for Linux systems, deploying, configuring, and maintaining Ubuntu-based environments globally. Responsibilities include developing automation scripts, implementing security compliance, troubleshooting performance, and providing expert support for on-prem, colocation, and AWS cloud infrastructure. You will design hybrid solutions, automate provisioning with tools like Ansible and Terraform, manage Hyper-V clusters, and support enterprise DNS. The role involves participating in high-severity incidents, liaising with vendors, and mentoring junior team members.
Must Have:- Act as an SME for Linux-based systems.
- Deploy, configure, and maintain Linux-based systems (Ubuntu) across global datacenter environments.
- Develop and maintain scripts (Bash, Python) for system administration, monitoring, and operational tasks.
- Implement and manage patching, hardening, and security compliance for Linux systems.
- Troubleshoot performance issues and optimize Linux environments for scalability and reliability.
- Provide expert-level support to deploy, maintain, troubleshoot, and upgrade Infrastructure in on-prem and colocation datacenter spaces, as well as cloud services such as AWS.
- Design and implement hybrid solutions that integrate on-premises environments with AWS cloud services for scalability and resilience.
- Automate provisioning and configuration of resources using tools such as Ansible, CloudFormation, or Terraform.
- Provide support for Cisco UCS, and traditional Dell and HP servers.
- Configure, monitor, maintain, and upgrade multiple large Hyper-V clusters.
- Provide support for large-scale enterprise DNS.
- Participate in high-severity incidents, take ownership, and lead charge in troubleshooting, resolution, and root cause analysis.
- Familiarize with Jira workflow, ticketing procedures, and implementations.
- Liaise with vendors/business units to build and document infrastructure environments.
- Provide expert advice, critically examine infrastructure and processes, and introduce/follow best practices to meet or exceed high availability, reliability, security, and industry compliance.
- Foster team collaboration, regularly and generously share knowledge, and participate in mentoring/upskilling junior team members.
- Participate in on-call rotations.
- Strong hands-on experience with Linux administration (Ubuntu preferred).
- Expert-level knowledge of core systems concepts such as DNS, TCP/IP, DHCP, Operating system, Virtualization, SSL Certificates/PKI.
- Hands-on experience with AWS services (EC2, VPC, IAM, S3, CloudWatch) and hybrid cloud integration.
- Proficiency in automation tools such as Ansible for configuration management.
- Scripting skills in Bash, Python, or similar languages for automation and operational efficiency.
- Expert-level knowledge of and extensive experience in Virtualization technologies, such as Microsoft Hyper-V.
- Experience in backup technologies, including, but not limited to Veeam, Commvault, and Cohesity.