Platform Engineer

2 Months ago • 6-10 Years • DevOps

About the job

Job Description

Cosm seeks a Platform Engineer to design, implement, and monitor their operations center infrastructure. Expertise in Grafana, Prometheus, Loki, and Tempo is crucial, along with strong knowledge of cloud platforms like Azure and AWS. Experience with virtualization/containerization technologies like Docker & Kubernetes is essential.
Must have:
  • Grafana, Prometheus
  • Azure, AWS
  • Docker, Kubernetes
  • Platform Engineer
Good to have:
  • Hyper-V, VMware
  • Pulumi, Terraform
  • Ansible, Puppet
  • Windows Server
Perks:
  • Hybrid Work
  • Global Company
Not hearing back from companies?
Unlock the secrets to a successful job application and accelerate your journey to your next opportunity.

About the job

Cosm is a global technology company that brings experiences to life in immersive environments. We help our partners create spaces and content that blur the lines of real and virtual across three primary markets: Sports and Entertainment, Science and Education, and Parks and Attractions. Cosm was born from the fusion of some of the greatest innovators in the history of technology. Evans & Sutherland, Spitz, Inc., and Cosm Immersive combined forces to power the immersive experiences of the future as Cosm. Innovation is in our DNA.

Summary

As a Platform Engineer, you will play a pivotal role in designing, implementing, automating, and maintaining the technology infrastructure that supports our organization's operations center. You will be responsible for designing robust, scalable, and resilient platforms that facilitate real-time monitoring, analysis, and decision-making processes critical to our business and product operations.

You will liaise with product and engineering teams to ensure applications and microservices support telemetry ingestion for actionable alerting and historical data graphing, thus building a continuous feedback loop for platform and product reliability.

The ideal candidate is a solutions-oriented person who can learn new technologies quickly and who can become competent with all layers of the development platform. They should be willing to roll up their sleeves and be familiar with various technologies but know how to choose the best technology for the job. Ideally, they are familiar with SaaS, live entertainment and broadcast as well as digital, tech, and streaming media. If you think you have the skills and are up for the challenge, consider this your calling.

Responsibilities

  • Monitoring and Alerting: Design and automate robust monitoring and alerting mechanisms to ensure the health, performance, and availability of the operations center platform, products and associated infrastructure components.
  • Application Monitoring: Work with software engineering and product teams to best understand how to monitor their applications and microservices.
  • Infrastructure Deployment: Collaborate with infrastructure teams to deploy and configure the necessary hardware and software components to support the operations center platform, including servers, networks, databases, and monitoring tools.
  • Documentation and Training: Create comprehensive documentation, diagrams, and guides to facilitate system understanding, troubleshooting, and knowledge transfer. Provide training and support to operations center staff on platform usage and best practices.
  • Collaboration and Stakeholder Management: Collaborate closely with cross-functional teams, including product, operations, IT, security, and business units, to understand requirements, gather feedback, and align observability platform architecture with organizational goals and priorities.
  • Incident Management: Work an on-call rotation to troubleshoot and resolve incidents, working closely with the support team to ensure prompt resolution.
  • Continuous Learning: Stay informed about industry trends and emerging technologies related to Windows Server, on-premise infrastructure, and Azure and AWS Cloud platforms.
  • Leadership: Provide technical guidance and mentorship to junior team members as needed.
  • Communication: Exemplify excellent written and verbal communication skills and the ability to tailor technical communications to any audience deftly.
  • Be Audacious: Push the limits, try new technologies, take calculated risks, swing for the fences, and proactively search for the best solutions and ideas in the marketplace.

Experience

  • Bachelor's or Master's degree in Computer Science, Information Technology, or a related field, or relevant work experience.
  • 6+ years of proven experience as a platform engineer, site reliability engineer, systems engineer or a similar role, with a focus on designing, implementing and monitoring the health of complex, distributed systems.
  • Expert-level knowledge of Grafana, Prometheus, Loki, and Tempo
  • Familiarity with scripting languages for automation and configuration management. PowerShell & BASH are paramount.
  • Strong understanding of cloud computing concepts and hands-on experience with Azure and/or AWS.
  • Experience with virtualization/containerization technologies such as Hyper-V or VMware, Amazon EC2, Docker & Kubernetes
  • Experience using Pulumi, Terraform and/or other IaC tools.
  • In-depth knowledge of Windows Server operating systems (2016/2019/2022), including installation, configuration, and troubleshooting.
  • Familiarity with Linux automation with tools such as Ansible or Puppet is a plus.
  • Expertise in data retrieval technologies, including constructing efficient PromQL, GraphQL & LogQL queries.
  • Solid understanding of networking principles and protocols.
  • Excellent problem-solving and troubleshooting skills, with a keen attention to detail.
  • Strong communication and interpersonal skills, with the ability to collaborate effectively with clients and team members.
  • Driven to automate your processes, test continually, and document your work.
  • You’re not afraid of an open, candid, and respectful work environment.
  • Experience in working with a cross-functional, distributed team from concept through completion and future iterations including agile methodologies.
  • Excellent time management skills.

Preferred Qualifications

  • Certifications in cloud platforms (e.g., AWS Certified Solutions Architect, Azure Solutions Architect) or similar.

Cosm is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

Apply Now

View Full Job Description

Add your resume

80%

Upload your resume, increase your shortlisting chances by 80%

About The Company

Haryana, India (Hybrid)

Los Angeles, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Cosm

Similar Jobs

WESURE INSURTECH SERVICES (INDIA) PRIVATE LIMITED - Senior Umbraco Developer

WESURE INSURTECH SERVICES (INDIA) PRIVATE LIMITED, India (Hybrid)

Hitachi - Azure Data Engineer (MS)

Hitachi, India (Remote)

DOTSOFT SA - Solutions Architect

DOTSOFT SA, Greece (On-Site)

Info Stretch - .Net Architect

Info Stretch, United States (On-Site)

Trend Micro - (Sr.) Software Engineer

Trend Micro, Taiwan (On-Site)

Social Discovery Group - Head of Development (Video Services)

Social Discovery Group, Serbia (Remote)

CyberArk - Senior Site Reliability Engineer

CyberArk, India (On-Site)

Aera Technology - Senior Platform Administration Engineer

Aera Technology, Romania (Hybrid)

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Ajmera Infotech - Engineering Lead

Ajmera Infotech, India (On-Site)

Scientific Games  - Specialist Software Engineer

Scientific Games , India (On-Site)

Paypal - Staff Engineer, Backend (Java)

Paypal, United States (Hybrid)

Rocket - Principal Solutions Consultant

Rocket, United States (Remote)

ByteDance - Network Engineer

ByteDance, United States (On-Site)

Get notifed when new similar jobs are uploaded

Jobs in Gurugram, Haryana, India

CloudHire - Machine Learning - Engineer

CloudHire, India (Remote)

Shoppers Stop - Assistant Designer-Kidswear-Girls

Shoppers Stop, India (On-Site)

Neysa Networks - Senior Cloud Engineer

Neysa Networks, India (On-Site)

Nielsen Holdings - SENIOR DEVOPS ENGINEER

Nielsen Holdings, India (Hybrid)

Zeta - Specialist  Process & Compliance

Zeta, India (On-Site)

Ampug Solutions - DevOps Engineer

Ampug Solutions, India (On-Site)

Simple Viral Games - Partnerships Manager

Simple Viral Games, India (On-Site)

Toppan Merrill - Site Reliability Engineer

Toppan Merrill, India (On-Site)

Get notifed when new similar jobs are uploaded

DevOps Jobs

Ness Digital - Architect - Offshore

Ness Digital, India (Hybrid)

PepsiCo - Architect - Voice Engineering

PepsiCo, India (On-Site)

Luxoft - Senior DevOps Engineer

Luxoft, India (Remote)

The Walt Disney Company - Lead Software Engineer, Scala

The Walt Disney Company, United States (On-Site)

Britive - SOFTWARE ENGINEER (CLOUD)

Britive, India (Remote)

Saama Technologies,  Inc  - Senior Site Reliability Engineer

Saama Technologies, Inc , India (On-Site)

Brillio - DB Migration Engineer - R01531207

Brillio, India (Hybrid)

Warner Bros Discovery - Staff Software Engineer

Warner Bros Discovery, India (On-Site)

Get notifed when new similar jobs are uploaded