Senior Telemetry Data Engineer

1 Hour ago • 4-8 Years • Data Analyst • DevOps

About the job

Job Description

The Senior Telemetry Data Engineer at Microsoft's Cloud Operations + Innovation (CO+I) team will design and deliver automated solutions for monitoring and alerting on data center critical environment resources. This role involves working with massive amounts of real-time data, leveraging machine learning models for anomaly detection, and utilizing cutting-edge technologies within a Lakehouse architecture. Responsibilities include designing telemetry data ingestion and processing systems, implementing anomaly detection systems, defining data models, and ensuring high-frequency, low-latency data pipelines. The engineer will collaborate with cross-functional teams, ensuring interoperability and high coverage of data center signals. Experience with KQL, Python, GoLang, or Spark is required, along with expertise in processing data from networking protocols.
Must have:
  • Subject matter expertise in machine learning models for anomaly detection
  • Experience with data lakes and real-time data streams
  • Proficiency in KQL, Python, GoLang, or Spark
  • Experience building AI/ML applications for IT operations
  • Bachelor's or master's degree in a related field
Good to have:
  • Experience designing telemetry systems for data center networks
  • Familiarity with HVAC, CRAC, AHU, Chillers, and other critical environment equipment
  • Knowledge of incident management and data center operations
  • Cloud computing certifications
Perks:
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Overview

Microsoft is on a mission to empower every person and every organization on the planet to achieve more. Our culture is centered on embracing a growth mindset, a theme of inspiring excellence, and encouraging teams and leaders to bring their best each day. In doing so, we create life-changing innovations that impact billions of lives around the world. You can help us achieve our mission.       

 

Cloud Operations + Innovation (CO+I) is the engine that powers Microsoft’s core cloud platforms and services that millions of people use every day. With more than 95% of Fortune 500 business on Azure, 180 million using Office 365, and millions using other services – all running on Microsoft's cloud infrastructure – CO+I builds and operates the foundation upon which Microsoft’s mission to empower every person and organization comes to life.       

 

Are you passionate about cloud computing? Do you get excited about taking a hands-on approach to transforming Microsoft’s most critical business through investigation, data analysis, and automation? If so, come and help us build the most reliable & efficient datacenter infrastructure on the planet. The CO+I Critical Environment Systems Intelligence (CESI) team is responsible for designing and delivering solutions to support global datacenter operations and to improve availability. CESI is helping to drive CO+I’s transition to a customer centric, data driven, observability based, live service culture. As a Data Engineer, you will be a key player in this transition.     

 

As a Senior Data Engineer on the CO+I Critical Environment Service Intelligence (CESI) team, you partner and collaborate on the design and delivery of automated solutions to monitor, detect, and alert on data center critical environment mechanical and electrical resourcesYou will collaborate with other CO+I teams to contribute and benefit from their work to ensure that we are constantly improving across the fleetYou will work with massive amounts of data with low latency requirements across cutting edge technologies, with the potential for significant impact to both internal partners and external customers.  

 

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day. 

Qualifications

Qualifications: 

  • Subject matter expertise level in supervised and unsupervised machine learning models for anomaly detection. 
  • Demonstrated subject matter expertise in utilizing data lakes within Lakehouse architectures to process, aggregate, and manage real-time data streams from cloud-based services. 
  • Demonstrated subject matter expertise in managing and processing large-scale data formats with a focus on real-time serialization and deserialization to ensure low-latency during data handling. This includes advanced proficiency in Kusto query language (KQL) with experience, and proficiency in coding with Python, GoLang, or Spark. 
  • Experience with generative AI or Copilots for troubleshooting data center environments. 
  • Expertise in processing data frames from networking layers and protocols, including BGP, TCP/IP, and GPRS tunneling protocol. 
  • Proven experience on building applications using artificial intelligence (AI) techniques, including machine learning (ML) and data science, to enhance and automate various IT operations (AIOPS).  
  • Bachelor's or master’s degree in computer science, data engineering, or a related field. 
  • Excellent problem-solving skills and attention to detail. 
  • Ability to work collaboratively with cross-functional teams. 
  • Strong written and verbal communication skills. 

Preferred Qualifications: 

  • Indepth experience in designing and implementing telemetry systems for data center networks. 
  • Familiarity with HVAC, CRAC, AHU, Chillers, and other critical environment equipment. 
  • Knowledge of incident management and data center operations. 
  • Certifiable knowledge in cloud computing. 

 

About Us: We are committed to maintaining the highest standards of operational excellence in our data centers. Join us in our mission to enhance our telemetry capabilities and ensure the reliability and efficiency of our critical environments. 

 

 

Background Check Requirements: 

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.  

 

#COICareers  

 

Responsibilities

As a Senior Technical Program Manager in DC Critical Environments, you will: 

  • Drive a program that covers end-to-end monitoring, and processing of critical environment (CE) infrastructure telemetry for all leased sites, to bring those sites on par with owned datacenter sites.  
  • Design and implement telemetry data ingestion and data processing systems for leased sites. 
  • Prototype, pilot, and deploy multi-signal anomaly detection and prevention systems leveraging machine learning and statistical analysis for DC leased sites  
  • Define and drive an operationalization plan for the telemetry pipeline for leased sites. 
  • Ensure interoperability of detection methods, systems, and workflows by defining conceptual, logical, and physical data models. 
  • Understand the signals coming from the EPMS and BAS systems for leased sites. 
  • Ensure high percent coverage and mapping of leased site signals including thermal, power, and other environmental conditions and data. 
  • Define a set of reusable primitives for mapping logical and physical topology of data centers leased sites. 
  • Ensure there is a high-frequency, high-volume, low-latency streaming and micro-batching capable pipeline to process DC CE telemetry from leased sites.  
  • Architect a staging model to ensure the onboarding of leased sites CE telemetry (thermal, power, and other environmental subjects). 
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect
View Full Job Description

Add your resume

80%

Upload your resume, increase your shortlisting chances by 80%

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

London, England, United Kingdom (On-Site)

Dublin, County Dublin, Ireland (On-Site)

Ho Chi Minh City, Ho Chi Minh City, Vietnam (On-Site)

San José, San José Province, Costa Rica (On-Site)

Prague, Prague, Czechia (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Similar Jobs

Inworld AI - Senior C++ Developer - Canada

Inworld AI, Canada (On-Site)

Rockstar Games - Senior Data Engineer

Rockstar Games, United States (On-Site)

Paytm - Data Engineer - Technical Lead

Paytm, India (On-Site)

Warner Bros Discovery - Senior Data Engineer - C360, Hyderabad

Warner Bros Discovery, India (On-Site)

10times - Data Scientist

10times, India (On-Site)

The Walt Disney Company - Senior Machine Learning Engineer

The Walt Disney Company, United States (On-Site)

Trendyol - Senior Data Scientist - Seller

Trendyol, Türkiye (Hybrid)

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Nielsen Holdings - Data Architect

Nielsen Holdings, India (On-Site)

Skillz - Lead Data Engineer

Skillz, United States (On-Site)

Armada - Senior Data Engineer

Armada, India (On-Site)

Ciklum - Expert Data Scientist

Ciklum, India (Hybrid)

Walmart - Senior, Software Engineer - AR/VR/XR Experiences

Walmart, United States (On-Site)

Get notifed when new similar jobs are uploaded

Data Analyst Jobs

Trendyol - Pricing Data Analyst

Trendyol, Türkiye (Hybrid)

Paytm - Senior Analyst - Travel

Paytm, India (On-Site)

Crazy games  - [REMOTE] Product Analyst

Crazy games , Belgium (Remote)

Electronic Arts - BI Engineer, EA Experiences

Electronic Arts, Canada (Hybrid)

Pro5 AI - Senior Data Scientist

Pro5 AI, India (Remote)

Headout - Business Growth Manager

Headout, India (On-Site)

Easy Brain - Data Scientist

Easy Brain, Cyprus (Hybrid)

Luxoft - Scala Tooling Visualization Developer

Luxoft, United States (Remote)

Paytm - Data Analyst - Deals & GV

Paytm, India (On-Site)

Get notifed when new similar jobs are uploaded