Lead Site Reliability Engineer

1 Month ago • 7 Years +

Job Summary

Job Description

The Site Reliability Engineer (SRE) role at Salesforce involves combining software and systems engineering to build and maintain large-scale, distributed, fault-tolerant systems. The SRE team ensures services have the necessary reliability, capacity, and performance. Key responsibilities include supporting and scaling multi-cloud, multi-region services, building automation, operating monitoring and alerting systems, improving CI/CD practices, defining and implementing SLIs/SLOs, and leading post-incident analysis. The role requires collaboration within Agile teams and a data-driven approach to drive platform improvements. This involves a deep understanding of complex systems and the ability to identify and address instability.
Must have:
  • 7+ years of experience in Python, Go, or Java for automation.
  • Experience designing and operating large-scale distributed systems.
  • Experience in developing and deploying production-grade software.
  • Ability to contribute to codebase improvements for reliability.
  • Strong understanding of software engineering best practices.
  • Excellent knowledge of Internet technologies and protocols.
  • Ability to address sources of instability in distributed systems.
  • Strong experience with API fundamentals (SOAP, REST).
  • Experience in Public Cloud environments and Kubernetes.
  • Knowledge of microservices, service mesh, and zero-trust infrastructure.
  • Solid knowledge of large-scale complex systems from a reliability perspective.
  • Experience with large-scale SDLC pipelines.
  • Strong Linux systems knowledge and troubleshooting skills.
  • Experience in fault modeling, chaos engineering, and load testing.
Good to have:
  • Experience operating in global, multi-tenant, or compliance-sensitive environments.
  • Understanding of SRE principles: SLIs/SLOs, availability, and incident metrics.
  • Data-driven mindset for improving service reliability.
  • Design and Implementation of Observability Solutions.
  • Strong written and verbal communication, with emphasis on documentation.
  • Experience integrating AI-driven automation and observability.

Job Details

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.

Job Category

Software Engineering

Job Details

About Salesforce

We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing well and doing good – you’ve come to the right place.

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Salesforce services have reliability, capacity, performance and the availability to deliver our customer’s needs and a rate of improvement that our customers expect.

Our software development focuses on enabling service owners to operate their services safely at scale, whether through paved path integrations onto observability frameworks, optimizing existing systems, designing infrastructure or eliminating work through AI/ML. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Salesforce, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. 

SRE’s culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

Required Skills

  • 7+ years of experience in Python, Go, or Java for automation, tooling, and integration.

  • Hands-on experience designing, building and operating large scale distributed systems, identifying shortcomings and optimization opportunities

  • Demonstrated experience in developing and deploying production-grade software applications or services.

  • Proven ability to contribute directly to application codebase improvements for reliability and scalability.

  • Strong understanding of software engineering best practices, including design patterns, testing methodologies, and code reviews, applied in a production environment.

  • Excellent knowledge of Internet technologies and protocols (TCP/IP, DNS, HTTP, SSL, etc.)

  • Ability to locate and address sources of instability in high-traffic, large-scale distributed systems

  • Strong experience with API fundamentals (SOAP, REST)

  • Experience in Public Cloud environments, Kubernetes and modern container orchestration.

  • Knowledge of microservices, service mesh, and zero-trust infrastructure.

  • Solid knowledge of large-scale complex systems from a reliability and availability perspective

  • Hands-on with experience with large scale SDLC pipelines.

  • Strong Linux systems knowledge and troubleshooting skills.

  • Experience in fault modeling and tolerance, chaos engineering, performance and load testing.

Responsibilities

  • Support and scale multi-cloud, multi-region services.

  • Build automation and self-healing capabilities to reduce manual operations.

  • Operate and scale monitoring, alerting, and tracing systems for proactive detection.

  • Improve CI/CD practices to accelerate safe, frequent deployments.

  • Define and implement SLIs/SLOs with engineering teams, driving reliability into system architecture.

  • Collaborate on integrating AI-driven automation and observability to enhance reliability.

  • Work within Agile teams, participating in SCRUM ceremonies and iterative delivery.

  • Lead post incident analysis, conduct postmortems, and ensure effective root cause resolution.

  • Use data to uncover trends, inform prioritization, and drive platform improvements.
     

Desired Skills

  • Experience operating in global, multi-tenant, or compliance-sensitive environments.

  • Understanding of SRE principles: SLIs/SLOs, availability, resiliency, and incident metrics (TTD, TTR).

  • Data-driven mindset for identifying systemic issues and improving service reliability.

  • Design and Implementation of Observability Solutions

  • Strong written and verbal communication, with emphasis on documentation and knowledge sharing.

  • Experience building and  integrating AI-driven automation and observability to enhance reliability

Accommodations

If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement

Salesforce is an equal opportunity employer and maintains a policy of non-discrimination with all employees and applicants for employment. What does that mean exactly? It means that at Salesforce, we believe in equality for all. And we believe we can lead the path to equality in part by creating a workplace that’s inclusive, and free from discrimination. Know your rights: workplace discrimination is illegal. Any employee or potential employee will be assessed on the basis of merit, competence and qualifications – without regard to race, religion, color, national origin, sex, sexual orientation, gender expression or identity, transgender status, age, disability, veteran or marital status, political viewpoint, or other classifications protected by law. This policy applies to current and prospective employees, no matter where they are in their Salesforce employment journey. It also applies to recruiting, hiring, job assignment, compensation, promotion, benefits, training, assessment of job performance, discipline, termination, and everything in between. Recruiting, hiring, and promotion decisions at Salesforce are fair and based on merit. The same goes for compensation, benefits, promotions, transfers, reduction in workforce, recall, training, and education.

Similar Jobs

Capgemini - Application Consultant - C

Capgemini

Mumbai, Maharashtra, India (On-Site)
2 Months ago
Hasura - Senior/Staff Software Engineer - Go Backend (Bengaluru/Hybrid)

Hasura

Bengaluru, Karnataka, India (Hybrid)
9 Months ago
Dentsu Aegis - Senior Analyst - Planning & Reporting (Aus)

Dentsu Aegis

Bengaluru, Karnataka, India (On-Site)
3 Weeks ago
Sailpoint - Sales Executive SLED

Sailpoint

(Remote)
1 Month ago
Whatnot - Account Executive (French Speaking)

Whatnot

London, England, United Kingdom (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Nightfall - Senior ML Platform Backend Engineer

Nightfall

San Francisco, California, United States (Hybrid)
4 Weeks ago
GoMotive - Enterprise Account Executive

GoMotive

Mexico City, Mexico (Remote)
1 Month ago
hh exchange - Customer Success Manager

hh exchange

California, United States (Remote)
1 Month ago
GoMotive - Manager, Enterprise Systems Engineering (QA)

GoMotive

Pakistan (Remote)
1 Month ago
Zurora - Strategic Account Executive

Zurora

United States (Remote)
1 Month ago
PwC - Consultant Expérimenté Salesforce | CDI | H/F

PwC

Neuilly-sur-Seine, Île-de-France, France (On-Site)
9 Months ago
Salesforce - Director, Corporate Communications

Salesforce

Tokyo, Japan (On-Site)
8 Months ago
CrowdStricke - Corporate Account Executive

CrowdStricke

Barcelona, Catalonia, Spain (On-Site)
1 Month ago
Salesforce - Operations Senior Manager (Customer Success)

Salesforce

Boston, Massachusetts, United States (Hybrid)
4 Weeks ago
Milestone - Channel Business Manager

Milestone

Munich, Bavaria, Germany (Hybrid)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Dublin, County Dublin, Ireland

playrix  - Senior Researcher

playrix

Ireland (Remote)
8 Months ago
playrix  - Lead Recruiter

playrix

Ireland (Remote)
8 Months ago
playrix  - Customer Support Representative (German and Russian)

playrix

Ireland (Remote)
7 Months ago
playrix  - Principal UI Artist

playrix

Ireland (Remote)
8 Months ago
Google - Account Strategist, Mid-Market Sales

Google

Dublin, County Dublin, Ireland (On-Site)
2 Months ago
Virtuos - Senior Lighting Artist

Virtuos

Ireland (Hybrid)
3 Months ago
Salesforce - SMB Account Executive - German Speaker

Salesforce

Dublin, County Dublin, Ireland (On-Site)
4 Weeks ago
Pluralsight - Principal Growth Account Executive - DACH

Pluralsight

Dublin, County Dublin, Ireland (On-Site)
1 Month ago
Tesla - Store Manager - Cork

Tesla

Cork, County Cork, Ireland (On-Site)
4 Months ago
Google - Account Strategist, Accelerated Growth

Google

Dublin, County Dublin, Ireland (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

We're Salesforce, the Customer Company, inspiring the future of business with AI + Data + CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing wellanddoing good – you've come to the right place.

Singapore (Hybrid)

Singapore (Hybrid)

Singapore (Hybrid)

Hyderabad, Telangana, India (On-Site)

Hyderabad, Telangana, India (Remote)

Bengaluru, Karnataka, India (On-Site)

Seattle, Washington, United States (Remote)

San Francisco, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by Salesforce

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug