Staff Cloud Infrastructure Engineer
Stem, Inc
Job Summary
Stem is seeking a Staff Cloud Infrastructure Engineer to design, build, and operate highly reliable, real-time backend systems that power critical energy infrastructure. This role focuses on large-scale distributed systems, cloud-native infrastructure, and operational excellence in always-on, mission-critical environments. The engineer will own and enhance distributed systems for data ingestion, streaming, analytics, and control pipelines, supporting grid-scale energy operations and integrating with industrial control protocols.
Must Have
- Design, build, and operate highly available cloud infrastructure supporting real-time, mission-critical workloads.
- Own and enhance distributed systems that power data ingestion, streaming, analytics, and control pipelines.
- Support real-time data streaming, alerting, and analytics platforms used for grid-scale energy operations.
- Design and operate systems that integrate with SCADA and industrial control protocols (e.g., Modbus, DNP3).
- Build and maintain comprehensive observability solutions, including metrics, logging, and distributed tracing.
- Participate in on-call and pager-duty rotations, ensuring high reliability and rapid incident response.
- Collaborate closely with product, data, and application engineering teams to deliver scalable and resilient solutions.
- Drive architectural decisions with a focus on scalability, performance, security, and operational excellence.
- 8+ years of experience in cloud infrastructure, backend systems, or distributed systems engineering.
- Strong programming experience in Java and Python.
- Deep understanding of distributed systems principles, including consistency models, fault tolerance, and scalability.
- Hands-on production experience operating Kubernetes-based, containerized workloads.
- Experience with real-time and streaming data platforms (e.g., Spark, Flink).
- Solid experience with data storage technologies: Relational (SQL) databases, NoSQL datastores, Search platforms such as Elasticsearch.
- Proven experience supporting mission-critical, always-on production systems.
- Strong background in designing and operating monitoring and observability platforms.
Good to Have
- Experience with SCADA systems or industrial communication protocols (Modbus, DNP3).
- Background in energy systems, grid infrastructure, or industrial IoT environments.
- Experience with data visualization and analytics platforms such as Grafana or Power BI.
- Experience building or operating low-latency, high-throughput data pipelines.
- Prior experience supporting regulated or safety-critical systems.
- Hands-on experience with AWS cloud services and infrastructure.
Perks & Benefits
- A competitive compensation package, including eligibility for a bonus or commission based on the role, and equity.
- Full health benefits on the first day of employment (several medical plan options-HDHP and PPO, dental plans, FSA/HSA-with employer contribution, employer paid vision/LTD/STD/Life, variety of voluntary coverage).
- 401k (pre- or post-tax) on first day of employment.
- 12 paid calendar holidays per year.
- Flexible time-off.
Job Description
Role Overview
Stem is seeking a Staff Cloud Infrastructure Engineer to design, build, and operate highly reliable, real-time backend systems that power critical energy infrastructure. This role focuses on large-scale distributed systems, cloud-native infrastructure, and operational excellence in always-on, mission-critical environments.
Key Responsibilities
- Design, build, and operate highly available cloud infrastructure supporting real-time, mission-critical workloads
- Own and enhance distributed systems that power data ingestion, streaming, analytics, and control pipelines
- Support real-time data streaming, alerting, and analytics platforms used for grid-scale energy operations
- Design and operate systems that integrate with SCADA and industrial control protocols (e.g., Modbus, DNP3)
- Build and maintain comprehensive observability solutions, including metrics, logging, and distributed tracing
- Participate in on-call and pager-duty rotations, ensuring high reliability and rapid incident response
- Collaborate closely with product, data, and application engineering teams to deliver scalable and resilient solutions
- Drive architectural decisions with a focus on scalability, performance, security, and operational excellence
Required Qualifications
- B.S./M.S. in Computer Science or related field, or equivalent experience.
- 8+ years of experience in cloud infrastructure, backend systems, or distributed systems engineering
- Strong programming experience in Java and Python; C++ experience is a plus
- Deep understanding of distributed systems principles, including consistency models, fault tolerance, and scalability
- Hands-on production experience operating Kubernetes-based, containerized workloads
- Experience with real-time and streaming data platforms (e.g., Spark, Flink)
- Solid experience with data storage technologies, including:
- Relational (SQL) databases
- NoSQL datastores
- Search platforms such as Elasticsearch
- Proven experience supporting mission-critical, always-on production systems
- Strong background in designing and operating monitoring and observability platforms
Preferred Qualifications
- Experience with SCADA systems or industrial communication protocols (Modbus, DNP3)
- Background in energy systems, grid infrastructure, or industrial IoT environments
- Experience with data visualization and analytics platforms such as Grafana or Power BI
- Experience building or operating low-latency, high-throughput data pipelines
- Prior experience supporting regulated or safety-critical systems
- Hands-on experience with AWS cloud services and infrastructure
Core Technologies and Platforms
- Java, Python (C++ a plus)
- Kubernetes and cloud-native infrastructure
- Real-time streaming and processing frameworks (Spark, Flink)
- SQL databases, NoSQL datastores, and Elasticsearch
- Observability platforms (metrics, logging, tracing)
- Grafana, Power BI
- SCADA and industrial control system integrations
Salary Range
$145,360.00 - $218,040.00
What We Offer:
At Stem, you will work in a growing, innovative, mission-driven company with talented colleagues that have a passion for building renewable energy systems. Stem offers competitive compensation as well as a comprehensive set of benefits to support the health and wellness of our employee including:
- A competitive compensation package, including eligibility for a bonus or commission based on the role, and equity
- Full health benefits on the first day of employment (several medical plan options-HDHP and PPO, dental plans, FSA/HSA-with employer contribution, employer paid vision/LTD/STD/Life, variety of voluntary coverage)
- 401k (pre- or post-tax) on first day of employment
- 12 paid calendar holidays per year
- Flexible time-off