Associate Site Reliability Engineer

20 Minutes ago • All levels • $133,000 PA - $133,000 PA
Devops

Job Description

Salesforce is seeking an engineering candidate to join the Site Reliability organization. This team monitors cloud service availability, swiftly repairs service-impacting issues, and ensures the Salesforce cloud and customers are protected 24/7. Responsibilities include detecting and resolving incidents, proactively addressing issues, and contributing to data security through monitoring, automation, and resiliency initiatives. The role involves tactical operations and large-scale production engineering.
Good To Have:
  • Prior Chef/Puppet or automated deployment experience
  • Prior Jenkins/Bamboo/Spinnaker pipeline execution experience
  • Experience in supporting and maintaining monitoring and alert systems
  • Experience in supporting and maintaining Java applications
  • Hands-on experience configuring and running AWS (Amazon Web Services), using the CLI/SDKs
  • Certifications in Linux+, RedHat and AWS
  • Experience in supporting and leading Kubernetes based applications and services
  • Familiar with Agile Process and DevOps
  • Experience taking part in blameless retrospectives and post-incident investigations
  • Working knowledge of and interest in resilience engineering
Must Have:
  • Maintain top performance of customer-facing services
  • Act in key support roles during major incidents (Sev0, Sev1)
  • Participate in technical review of incidents
  • Participate in RCAs and hand off to Global Solutions team
  • Ensure work aligns with company compliance policy
  • Solve technical issues and customer concerns
  • Automate detection and resolution of recurring issues
  • Improve processes to reduce operational toil
  • Strong background in Computer Science or engineering
  • Systems engineering experience in enterprise internet services
  • Expertise in TCP/IP and networking protocols
  • Expertise in CLI support for Unix/Linux (Red Hat, Solaris, BSD)
  • Strong understanding of monitoring security systems
  • Experience in Incident Management and ITIL service operations
  • Willingness to work in a 24/7 team managing data centers
  • Availability for shift work and on-call duties
  • Experience with AWS/C2S infrastructure and systems
  • Experience writing scripts in Python, Go, or other languages
Perks:
  • Time off programs
  • Medical insurance
  • Dental insurance
  • Vision insurance
  • Mental health support
  • Paid parental leave
  • Life and disability insurance
  • 401(k)
  • Employee stock purchasing program

Add these skills to join the top 1% applicants for this job

communication
game-texts
agile-development
salesforce
networking
linux
aws
unix
chef
spinnaker
puppet
bamboo
amazon-web-services
kubernetes
python
jenkins
java

About Salesforce

Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn’t a buzzword — it’s a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all.

Ready to level-up your career at the company leading workforce transformation in the agentic era? You’re in the right place! Agentforce is the future of AI, and you are the future of Salesforce.

This role is only open to those who are current or prior Salesforce interns. All others who do not meet this criteria will not be considered.

\*\*This candidate must be a U.S. citizen (U.S. born or naturalized) operating on U.S. Soil who does not hold dual citizenship and has the ability to meet customer and government screening standards applicable to this role, including a Criminal Justice Information Services screening with fingerprint scan. Due to the citizenship requirements for this role, which supports U.S. federal, state, and/or local government customers, citizenship will be verified through two of the following REAL ID Act documents: U.S. Passport, Passport Card, REAL Driver’s License, Global Entry Card, U.S. Government CAC/PIV. You agree to complete a Minimum Background Investigation (MBI) for a Moderate Public Trust position with the U.S. federal government and gain other clearances as deemed appropriate for the role.\*\*

Salesforce is seeking an engineering candidate to join the Site Reliability organization. Working closely with counterparts in the Infrastructure and R&D organizations, this organization provides a team of engineers monitoring cloud service availability and ready to swiftly repair any service-impacting issues. Seven days a week, 24 hours a day, the Site Reliability team keeps the Salesforce cloud and our customers protected. As a member of the Site Reliability team, you will be responsible for the primary task of detecting and resolving incidents within minutes. This objective is met by monitoring the services, reacting to problems, and proactively addressing issues before they affect performance or availability.

The team contributes to the customer and Salesforce by securing data through monitoring, automation, self-healing and resiliency initiatives, destructive testing, and game day exercises. The incumbent in this role would demonstrate a strong focus on tactical operations, as well as large-scale production engineering and orchestration.

GovCloud Observability Team

Our team motto is "See Something, Say Something", but our general responsibilities are ensuring proper monitoring and alerting are in place to ensure the availability of the Govcloud infrastructure and the service that run on it. We identify gaps in our monitoring as well as continuously improve on the accuracy of alerts as well as the tooling to triage with metrics and logs.

Role Description:

  • Keep the customer-facing services available at top performance by maintaining the constant health of the supporting systems.
  • Incident management - Act in key support roles during major incidents e.g. Sev0, Sev1. Also, participate in the technical review of the incident for problem management
  • Problem Management - populate and participate in RCAs and hand them off to the Global Solutions team
  • Ensuring that work carried out by the Site Reliability team is performed in such a way as to stay in sync with the company’s internal compliance policy and directives
  • Passionate about solving technical issues and customer concerns with other technical staff as the need arises
  • Work with and lead other members of the team in staying on top of key industry innovation and technology, and assist in team development growth
  • Ability to operate in the fast paced environment and solve sophisticated issues quickly successfully balance multiple priorities
  • Work to automate detection and resolution of recurring issues in the production environment
  • Help create and improve current processes to reduce operations and engineering toil

Minimum Requirements:

  • **\*\*This candidate must be a U.S. citizen (U.S. born or naturalized) operating on U.S. Soil who does not hold dual citizenship and has the ability to meet customer and government screening standards applicable to this role, including a Criminal Justice Information Services screening with fingerprint scan.

Due to the citizenship requirements for this role, which supports U.S. federal, state, and/or local government customers, citizenship will be verified through two of the following REAL ID Act documents: U.S. Passport, Passport Card, REAL Driver’s License, Global Entry Card, U.S. Government CAC/PIV.

You agree to complete a Minimum Background Investigation (MBI) for a Moderate Public Trust position with the U.S. federal government and gain other clearances as deemed appropriate for the role.\*\***

  • Strong background in Computer Science or related engineering discipline
  • Must be located in North America
  • Systems engineering experience in enterprise scale internet service engineering or support role
  • Expertise in TCP/IP related technologies (networking protocols, network programming, etc.)
  • Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD) as well as strong Linux/UNIX knowledge with significant exposure to Red Hat Enterprise Linux and Solaris
  • Strong understanding of monitoring security systems and administration
  • Strong Communication skills (Written and Oral)
  • Past experience in Incident Management and good understanding of ITIL service operations
  • Willingness to work in a 24/7 team managing large data centers
  • Be available for shift work and being on call if required
  • Experience provisioning, operating, and running AWS/C2S based infrastructure and systems
  • Understand and have experience with writing scripts in Python, Go, or other languages

Preferred Qualifications:

  • Prior Chef/Puppet or automated deployment experience
  • Prior Jenkins/Bamboo/Spinnaker pipeline execution experience
  • Experience in supporting and maintaining a monitoring and alert systems
  • Experience in supporting and maintaining Java applications
  • Hands on experience configuring and running AWS (Amazon Web Services), using the CLI/SDKs
  • Certifications in Linux+, RedHat and AWS
  • Experience in supporting and leading Kubernetes based applications and services
  • Familiar with Agile Process and DevOps
  • Experience taking part in blameless retrospectives, learning from incidents, and conducting post-incident investigations, including incident analysis as well as performance evaluations of responders
  • Working knowledge of and interest in resilience engineering including concepts such as safety II and looking at how things go right instead of how things go wrong, being proactive instead of reactive, and investigating complex sociotechnical systems

This candidate must be a U.S. citizen (U.S. born or naturalized) who does not hold dual citizenship and agrees to complete a U.S. federal government Minimum Background Investigation (MBI) for a Moderate Public Trust position.

Set alerts for more jobs like Associate Site Reliability Engineer
Set alerts for new jobs by Salesforce
Set alerts for new Devops jobs in United States
Set alerts for new jobs in United States
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙