Senior Critical Infrastructure Platform Services Manager

1 Month ago • 3-8 Years • DevOps • Operations

Job Summary

Job Description

The Senior Critical Infrastructure Platform Services Manager at Microsoft will lead a team responsible for designing, engineering, and operating critical infrastructure across global datacenters. Responsibilities include demand planning, capacity utilization, managing physical infrastructure (supply chain, hardware, power, security, workflow teams), and ensuring SLAs and KPIs are met. This role requires expertise in platform technologies (Windows/Linux OS, Hyper-V, Active Directory, DNS, PowerShell, data protection), architectural design of highly available and resilient systems, and managing engineering projects using Agile and DevOps methodologies. The manager will also focus on operational excellence, continuous improvement through automation, and ensuring team compliance with security and privacy standards. Strong people management and collaboration skills are essential.
Must have:
  • 3+ years' experience in large-scale cloud/distributed systems
  • 3+ years people management experience
  • Expertise in platform technologies (Windows/Linux, Hyper-V, Active Directory)
  • Strong project management skills using Agile & DevOps
  • Operational excellence & continuous improvement focus
Good to have:
  • MIS, MCSE/MCSD, or other engineering certifications
  • Experience working in large-scale production datacenters
Perks:
  • Industry leading healthcare
  • Educational resources
  • Discounts on products and services
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Opportunities to network and connect

Job Details

Overview

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day and we need you as a Senior Critical Infrastructure Platform Services (CIPS) Manager.  

 

Microsoft’s Cloud Operations & Innovation (CO+I) is the engine that powers our cloud services. As a Senior (CIPE) Platform Engineering Manager you will perform a key role in delivering the core infrastructure and foundational technologies for Microsoft's online services including Bing, Office 365, Xbox, OneDrive, and the Microsoft Azure platform. You will manage a team that is responsible for the design, engineering and operation of the CIPE infrastructure across our unified global datacenters; managing the demand planning and capacity utilization; and responsible for running the physical infrastructure (including supply chain, hardware, power, security, and workflow teams). We emphasize automation, data driven engineering, cost-effectiveness, and environmental sustainability. This opportunity will allow you to join and manage a team of Platform Engineers who are passionate about designing, building and operating the world's most advanced cloud infrastructure. You will work on cutting-edge technologies and projects that enable Microsoft to deliver innovative solutions and services to our customers while collaborating with other teams across the company and learning from the best in the industry. If you are looking for a challenging and rewarding career, this is the role for you. This is a flexible work opportunity role offering remote work from home. 

 

Our infrastructure is comprised of a large global portfolio of more than 100 datacenters and 1 million servers. Our foundation is built upon and managed by a team of subject matter experts working to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide.    

 

With environmental sustainability and optimization at the forefront of our datacenter design and operations, we continue to grow and evolve as we meet the ever-changing business demands that hold Microsoft as a world-class cloud provider.    

 

Do you want to empower billions across the world? Come and join us in CO+I and be at the forefront of the action! 

 

Qualifications

Required Qualifications:   

  • Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, service engineering, or systems engineering  
  • OR equivalent experience. 

 

Additional or Preferred Qualifications: 

  • Bachelor's Degree in Computer Science, Information Technology, or related field AND 8+ years technical experience in software engineering, network engineering, service engineering, or systems engineering OR equivalent experience.
  • 3+ years technical experience working with large-scale cloud or distributed systems. 
  • 3+ years people management experience. 
  • Management Information Systems (MIS), Microsoft Certified Solutions Expert (MCSE)/Microsoft Certified Solutions Developer (MCSD), or other Engineering Certifications. 
  • 3+ years’ experience working\supporting large scale production datacenters 
  •  

Responsibilities

Responsibilities:  

 

People Management and Support 

  • Managers deliver success through empowerment and accountability by modeling, coaching, and caring. 
  • Model - Live our culture; Embody our values; Practice our leadership principles. 
  • Coach - Define team objectives and outcomes; Enable success across boundaries; Help the team adapt and learn. 
  • Care - Attract and retain great people; Know each individual’s capabilities and aspirations; Invest in the growth of others. 

 

Technical Knowledge and Expertise 

  • Develops end-to-end expertise in service and/or system design, interactions between technology layers and components, functions of infrastructure, and dependencies at scale. Develops team's end-to-end technical expertise, regularly identifying skill gaps and raising the collective bar on the team's skill set in alignment with industry standards. Takes ownership of service design by driving efforts within an organization to identify, define, recommend, and build optimal configurations of technology solutions with consideration for cost management. Adjusts configurations and defines infrastructures to improve the availability, reliability, efficiency, observability, and/or performance of supported products and services. Leverages technical expertise to identify and design, deliver, and operate solutions across organizations. Drives reviews with the engineering teams that develop and/or manage services, identifying opportunities for efficiencies in operations and sharing learnings and recommendations across engineering teams working on related services within their organization. 
  • Guides teams to stay current in knowledge and expertise as the technology landscape evolves, maintaining awareness of industry norms. Uses knowledge to drive the adoption of new solutions across engineering teams working with related products within an organization. Makes expertise available to others through sharing, coaching, conferences, and other means to drive improvements across teams. 
  • Provide Subject Matter Expertise in the deployment of various platform technologies at scale (Windows OS, Linux OS, Hyper-V, Active Directory, DNS, PowerShell scripting, data protection technologies and more) and support high priority needs of the business on a variety of special projects which often involve expedited deliverables, operational agility, and impact to operational functions. 
  • Perform technical analysis and lead the design architecture of critical infrastructure platform plans for the enterprise, analyzing platform & security requirements of the business and architect solutions that include highly available and resilient designs. 
  • Own and drive technical roadmap and strategy, through effective collaboration with architects, engineers, and senior leaders. 

 

Operational Excellence 

  • Support platform infrastructure, ensure SLAs and KPIs are met and maintained, maintain & improve security posture of environment, execute OS and firmware patching, provide 24/7/365 service and infrastructure operations support, maintain compliance, support builds and deployments, end of life refreshes, and service optimization. 
  • Execute continuous improvement in the environment to automate manual processes, improve security, availability, and overall support for critical infrastructure in the environment.  
  • Manages teams of engineers to implement reliable, scalable, and high-performance solutions across teams. Contribute to design documents. Own implementation and rollback plans. Maintain quality checklist and related documentation, unblocking as needed. 
  • Holds the team accountable for creating, monitoring, and taking action on telemetry data and provides guidance on telemetry analytics to better identify patterns that reveal errors and unexpected problems that are affecting the system availability, reliability, performance, and/or efficiency. Manages the development of scripts and/or automation across a team and leverages an understanding of solutions to define, develop, measure, track, change, and improve the quality of telemetry pipelines that support automated monitoring and incident response. 
  • Holds team accountable for participation in on-call rotations and manages teams of Service Engineers responding to incidents to identify the level of impact, troubleshoot issues, and deploy appropriate fixes to resolve root cause(s) and prevent incident recurrence across related products. Ensures that Service Engineers within their organization have the technical knowledge and resources required to respond to incidents and make difficult decisions based on business impact. Ensures relevant engineering teams, stakeholders, leaders are alerted to customer impacting issues. Ensures major issues are escalated to other teams as needed. Ensures postmortems are conducted. Ensures key details related to incidents and their resolution are shared through post-mortem reports and regular review meetings. Provides clarity during incidents, helps determine impact and define the scope of severity, and facilitates development of incident response and resolution guidance. 
  • Holds team accountable for understanding and following prescriptive guidance for security, privacy, and compliance standards in alignment with direction from the business and technical experts. Develops team's compliance awareness by conducting training and disseminating relevant information. Guides team to identify patterns of violations and implement automations for prevention. Works with security, privacy, and compliance teams to identify and address relevant security, privacy, and compliance issues across teams. 

 

Project Management 

  • Lead and manage platform engineering projects from inception to completion using Agile methodologies and DevOps-based tools, ensuring alignment with organizational goals and priorities. 
  • Oversee sprint planning, backlog grooming, daily stand-ups, and manage timelines and budgets, ensuring effective collaboration and communication within the team. 

 

Collaboration and Knowledge Sharing 

  • Drives collaboration across teams by promoting the open exchange of information, resolving issues within and beyond their immediate team, managing conflict and teamwork challenges, and removing barriers to enable teams to quickly shift priorities without losing productivity. Identifies and includes all stakeholders in decisions and represents their organization with partners, customers, and external stakeholders, maintaining active engagement so issues can be resolved and mutual objectives are met. Ensures information is systematically and clearly communicated across teams. 
  • Facilitates sharing of insights and best practices that can be applied to improve development and operations across related sets of systems, platforms, and/or products. Continues to develop their understanding of insights and best practices through interactions with more experienced Service Engineers, members of product engineering teams, and other resources (e.g., conferences, brown bags, wikis, documentation). Mentors and coaches other engineers to help them identify and propose relevant solutions. 

 

Other 

  • Embody our and .  

 

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Similar Jobs

Microsoft - Senior Hardware Verification Engineer

Microsoft

(On-Site)
1 Month ago
Activision - Senior Cloud Security Engineer

Activision

Barcelona, Catalonia, Spain (On-Site)
1 Month ago
CloudHire - Anaplan Solution Architect

CloudHire

Atlanta, Georgia, United States (On-Site)
4 Months ago
Interactive Brokers - Senior Systems Engineer- Microsoft M365/Active Directory

Interactive Brokers

Chicago, Illinois, United States (Hybrid)
4 Months ago
Hitachi - Data Engineer + Power BI Developer

Hitachi

Pune, Maharashtra, India (Remote)
4 Months ago
ION - Database Engineer (352), New York (hybrid)

ION

New York, New York, United States (Hybrid)
4 Months ago
ION - Cloud Engineer Kubernetes

ION

Collecchio, Emilia-Romagna, Italy (Hybrid)
4 Months ago
Ubisoft - DevOps Linux Administrator

Ubisoft

Saint-Mandé, Île-de-France, France (Hybrid)
4 Weeks ago
Magna International - Senior Cloud Engineer

Magna International

Bengaluru, Karnataka, India (On-Site)
3 Months ago
ByteDance - Tech Lead (SRE) - Cloud Infrastructure

ByteDance

Singapore (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Hitachi - Azure Developer

Hitachi

Hyderabad, Telangana, India (Remote)
4 Months ago
PwC - Senior Associate _Senior Azure Data Engineer_D&A_Advisory_Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Microsoft - Research Intern - AI Systems and Architecture

Microsoft

Mountain View, California, United States (On-Site)
1 Month ago
Microsoft - Senior Firmware Validation Engineer

Microsoft

(On-Site)
1 Month ago
Magna International - Full-Stack Developer

Magna International

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Microsoft - Digital Solution Area Specialist - Azure Infrastructure - UK Market

Microsoft

Dublin, County Dublin, Ireland (On-Site)
1 Month ago
Ziff Davis - Senior Software Engineer, Backend - Lose It!

Ziff Davis

United States (On-Site)
3 Months ago
Microsoft - Software Engineer (Full-stack)

Microsoft

Taipei City, Taiwan (On-Site)
1 Month ago
InMobiInMobi - Data Scientist III

InMobiInMobi

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Microsoft - Azure Digital Solution Area Specialist - Swiss Market (German speaking & SAP On Azure focus)

Microsoft

Dublin, County Dublin, Ireland (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Dublin, County Dublin, Ireland

Google - Sales Manager, Mid-Market Sales, Google Customer Solutions, Market (English, Polish)

Google

Dublin, County Dublin, Ireland (On-Site)
2 Months ago
Riot Games - Director, Software Engineering - League of Legends

Riot Games

Dublin, County Dublin, Ireland (On-Site)
3 Months ago
Riot Games - Principal Software Engineer, Gameplay - Teamfight Tactics

Riot Games

Dublin, County Dublin, Ireland (On-Site)
3 Months ago
Tesla - Delivery Advisor

Tesla

Dublin, County Dublin, Ireland (On-Site)
3 Weeks ago
Microsoft - Digital Solution Specialist, Business Applications

Microsoft

Dublin, County Dublin, Ireland (Hybrid)
1 Month ago
Virtuos - Producer

Virtuos

Ireland (Hybrid)
4 Months ago
Microsoft - Digital Technology Specialists - Security - French Speaker

Microsoft

Dublin, County Dublin, Ireland (Hybrid)
1 Month ago
Riot Games - Compensation Partner III

Riot Games

Dublin, County Dublin, Ireland (On-Site)
3 Months ago
Logitech - Anaplan model builder

Logitech

Cork, County Cork, Ireland (Hybrid)
4 Months ago
eBay - Principal People Technology Analyst

eBay

Dublin, County Dublin, Ireland (Hybrid)
4 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Axon - Senior Privacy Engineer

Axon

Scottsdale, Arizona, United States (Hybrid)
2 Months ago
Rackspace Technology - AWS Cloud Engineer II

Rackspace Technology

Jalisco, Mexico (Remote)
4 Months ago
Trend Micro - Sr. Engineer

Trend Micro

Taipei City, Taiwan (On-Site)
4 Months ago
Crunchyroll - Staff Partner Engineer - Data & Services

Crunchyroll

San Francisco, California, United States (Hybrid)
3 Months ago
Brillio - Azure DB Architect - Migration - R01531206

Brillio

Bengaluru, Karnataka, India (Hybrid)
4 Months ago
Animoca Brands - Senior DevOps Engineer

Animoca Brands

Hong Kong (On-Site)
5 Months ago
Applike Group - Senior DevOps Engineer  (f/m/d) 🚀

Applike Group

Hamburg, Hamburg, Germany (Hybrid)
4 Months ago
Nagarro - Senior Staff Engineer (Python Azure Synapse)

Nagarro

India (On-Site)
4 Months ago
N-iX - Senior Azure DevOps Engineer

N-iX

Poland (Remote)
1 Month ago
Global Payments  Inc  - Senior DevOps Engineer

Global Payments Inc

Pune, Maharashtra, India (Hybrid)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

London, England, United Kingdom (On-Site)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

New York, New York, United States (Hybrid)

Mountain View, California, United States (Hybrid)

Mountain View, California, United States (Hybrid)

London, England, United Kingdom (On-Site)

Dublin, County Dublin, Ireland (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug