Sr Sofware Developer / Site Reliability Engineer - ASE iCloud Content

16 Minutes ago • 5 Years + • $171,600 PA - $258,100 PA
Devops

Job Description

The Services Engineering (ASE) team is seeking a passionate and talented Sr Software Developer / Site Reliability Engineer for iCloud Content. This role involves designing, engineering, and running products and platforms that scale globally and maintain high availability for hundreds of millions of users. Responsibilities include operating, monitoring, and triaging production environments, implementing telemetry systems, automating deployments, and participating in capacity planning and disaster recovery exercises. The ideal candidate will solve complex problems using data, teamwork, and expertise to ensure the highest quality Services experience.
Good To Have:
  • Fast learner who is generous with their knowledge
  • Experience with disaster recovery, capacity planning and chaos testing
  • Being curious about how systems work and, more importantly, how they fail
  • An acute drive to build bots that automate away repetitive tasks
  • Working knowledge of microservices architecture and container orchestration with Kubernetes or similar technologies, preferably in a large-scale production environment
  • Experience with managing large numbers of diverse systems with configuration management and software delivery platforms (such as Spinnaker, Terraform, Puppet, Chef or Ansible) in a public, private, or hybrid cloud environment
  • Experience with Linux/Unix, Networking, Systems Management, Systems Security
  • Experience using modern object storage systems like S3, GCS.
  • Familiarity with large-scale observability systems like Prometheus, Grafana, Splunk
  • A track record of partnering with peers to foster solid engineering principles
  • Strong belief in acquiring and spreading knowledge via mentorship
Must Have:
  • Operate, monitor, and triage all aspects of our production and non-production environments.
  • Pioneer and implement the next-generation telemetry system.
  • Prepare alert handling procedures, runbooks, and collaborate with the off-shore SRE teams.
  • Automate deployment and orchestration of services into the cloud environment as well as other routine processes.
  • Actively participate in capacity planning, scale testing, and disaster recovery exercises.
  • Interact with and support partner teams, including engineering, QA, and program management.
  • Cultivate and maintain relationships with internal and external third-party vendors.
  • 5+ years of software development or production operations experience in a large-scale environment
  • BS or MS in Computer Science or related field
  • Experience in managing and scaling large distributed systems in a public, private, or hybrid cloud environment
  • An inherent bias for action, strong sense of ownership and integrity demonstrated through clear communication and collaboration
  • Experience with deploying and supporting new and existing services, platforms, and application stacks
  • Familiarity with cloud infrastructure concepts (zones, regions, VPCs, etc)
  • Excellent troubleshooting and problem solving skills
  • Skills and experience in monitoring, alerting, fault analysis, and automation
  • The ability to design, author and release code in languages like Java, Go, or Python
  • Ability to participate in on call service support
  • Lead incident response and root cause analysis of production systems
Perks:
  • Comprehensive medical and dental coverage
  • Retirement benefits
  • Range of discounted products and free services
  • Reimbursement for certain educational expenses (including tuition) for formal education related to advancing your career
  • Opportunity to become a shareholder through participation in discretionary employee stock programs
  • Eligibility for discretionary restricted stock unit awards
  • Ability to purchase stock at a discount if voluntarily participating in Employee Stock Purchase Plan
  • Role might be eligible for discretionary bonuses or commission payments
  • Role might be eligible for relocation

Add these skills to join the top 1% applicants for this job

team-management
problem-solving
game-texts
software-development-lifecycle-sdlc
quality-control
html
networking
linux
incident-response
unix
prometheus
ansible
terraform
grafana
chef
spinnaker
puppet
microservices
kubernetes
python
splunk
java

People don’t just build products — they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join us, and help us leave the world better than we found it. The Services Engineering (ASE) team builds and provides platforms, services and infrastructure that fuel our services (such as iCloud, iTunes, Siri, and Maps). We are the foundation on which our software developers build the products that our customers love. We are looking for passionate and talented engineers to continue our focus in providing our customers the highest quality Services experience. Our services have to scale globally, stay highly available, and "just work.” If you love designing, engineering and running products and platforms that will help millions of customers, then this is the place for you!

Services’ scale is BIG. Operating at our scale, across multiple geographies and servicing hundreds of millions of users presents unique challenges. As a Software Developer in SRE, you'll need to solve these problems using data, teamwork, and your own expertise. ASE Products Site Reliability teams are responsible for the reliability and performance of the server software stack that powers products like iCloud Photos, Mail, Drive, Backup and many more. We do that by focusing on reliability best practices from service inception to production, collaborating deeply with product development teams to deliver a superlative product and shared vision while leveraging data and automation as first principles. We run a mix of open source, vendor licensed, and internally developed tools to manage the end to end SDLC of our products. You'll learn these tools and have opportunities to improve them. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded.

  • As an SRE, you will:
  • Operate, monitor, and triage all aspects of our production and non-production environments.
  • Pioneer and implement the next-generation telemetry system.
  • Prepare alert handling procedures, runbooks, and collaborate with the off-shore SRE teams.
  • Automate deployment and orchestration of services into the cloud environment as well as other routine processes.
  • Actively participate in capacity planning, scale testing, and disaster recovery exercises.
  • Interact with and support partner teams, including engineering, QA, and program management.
  • Cultivate and maintain relationships with internal and external third-party vendors.
  • 5+ years of software development or production operations experience in a large-scale environment
  • BS or MS in Computer Science or related field
  • Experience in managing and scaling large distributed systems in a public, private, or hybrid cloud environment
  • An inherent bias for action, strong sense of ownership and integrity demonstrated through clear communication and collaboration
  • Experience with deploying and supporting new and existing services, platforms, and application stacks
  • Familiarity with cloud infrastructure concepts (zones, regions, VPCs, etc)
  • Excellent troubleshooting and problem solving skills
  • Skills and experience in monitoring, alerting, fault analysis, and automation
  • The ability to design, author and release code in languages like Java, Go, or Python
  • Ability to participate in on call service support
  • Lead incident response and root cause analysis of production systems
  • Fast learner who is generous with their knowledge
  • Experience with disaster recovery, capacity planning and chaos testing
  • Being curious about how systems work and, more importantly, how they fail
  • An acute drive to build bots that automate away repetitive tasks
  • Working knowledge of microservices architecture and container orchestration with Kubernetes or similar technologies, preferably in a large-scale production environment
  • Experience with managing large numbers of diverse systems with configuration management and software delivery platforms (such as Spinnaker, Terraform, Puppet, Chef or Ansible) in a public, private, or hybrid cloud environment
  • Experience with Linux/Unix, Networking, Systems Management, Systems Security
  • Experience using modern object storage systems like S3, GCS.
  • Familiarity with large-scale observability systems like Prometheus, Grafana, Splunk
  • A track record of partnering with peers to foster solid engineering principles
  • Strong belief in acquiring and spreading knowledge via mentorship

At our company, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $171,600 and $258,100, and your base pay will depend on your skills, qualifications, experience, and location.

Our employees also have the opportunity to become a shareholder through participation in our discretionary employee stock programs. Our employees are eligible for discretionary restricted stock unit awards, and can purchase stock at a discount if voluntarily participating in our Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career with us, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about our Benefits.

Note: Our benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

We are an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

Set alerts for more jobs like Sr Sofware Developer / Site Reliability Engineer - ASE iCloud Content
Set alerts for new jobs by Apple
Set alerts for new Devops jobs in United States
Set alerts for new jobs in United States
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙