Staff Software Engineer, Infrastructure

undefined ago • All levels • Devops • $192,200 PA - $334,600 PA

Job Summary

Job Description

The Webapp Infrastructure (WIN) pillar provides tools for hundreds of developers to work safely and productively in a large codebase. This role is for the Webapp Infra Reliability Engineering (WIRE) team, which develops, runs, and scales core components of Slack’s Webapp Infrastructure and Product, including API servers, asynchronous job processing, caching, and rate limiting. The team focuses on improving visibility, speed, and safety of Slack’s distributed application architecture, driving reliability, infrastructure upgrades, and operational efficiency. The Staff Software Engineer will support Webapp’s infrastructure, define reliability solutions, automate maintenance, manage deployments, and participate in on-call rotation.
Must have:
  • Familiarity and experience with software development, including traditional operations and/or infrastructure tooling.
  • Experience managing critical production infrastructure, maintaining reliability and uptime, and having a customer first view of operational safety.
  • Experience with functional or imperative programming languages such as Ruby and Go.
  • Experience with Chef, Terraform, cloud infrastructure (ideally AWS), IAMs, Docker, Linux, and observability tools such as Logstash, Kibana, Prometheus, and Grafana.
  • Strong collaboration skills.
  • Familiarity with operational metrics, experience with incident management and strong debugging skills.
  • Bachelor’s degree in Computer Science, Engineering or related field, or equivalent training or work experience.
Good to have:
  • Experience as a Site Reliability Engineer (SRE), or as a platform or infrastructure engineer building and managing reliability mechanisms on distributed infrastructure.
  • Comfortable with deploying, operating and debugging distributed systems on Linux at scale.
  • Experience with AWS infrastructure at scale.
  • Experience with HHVM, mcrouter, and memcached.
  • Ability to dig deep across multiple layers of the stack, from networking and virtualization to configuration management and packaging.
  • Experience working within highly regulated environments where an understanding of FEDRAMP/NIST frameworks were essential.
Perks:
  • Time off programs
  • Medical insurance
  • Dental insurance
  • Vision insurance
  • Mental health support
  • Paid parental leave
  • Life and disability insurance
  • 401(k)
  • Employee stock purchasing program

Job Details

About the Team

The Webapp Infrastructure (WIN) pillar provides the tools to make it possible for hundreds of developers to develop in a multi million line codebase with safety and productivity at the forefront. WIN handles maintenance and upgrades of the Hack programming language, static analysis tooling, widely used libraries in the codebase, as well as tuning and debugging the HHVM runtime and other services it depends on. With two teams: Runtime, Async Services, and Core Libraries (RASCL) and Webapp Infra Reliability Engineering (WIRE), the Webapp Infrastructure pillar supports the middle layers of the stack above our compute infrastructure and below the product code. This role is open for the WIRE team.

The WIRE team develops, runs and scales core components of Slack’s Webapp Infrastructure and Product. We own, maintain, and improve the systems that power Slack’s API servers, asynchronous job processing, caching, and rate limiting. We continuously seek to improve the visibility, speed, and safety of Slack’s distributed application architecture! Part of the team’s charter is also to drive high priority efforts for reliability, infrastructure upgrades, migrations, capacity planning, operational efficiency and simplification.

We know we’ve done our job correctly when none of our users think about us. In other words, Slack just works seamlessly!

On this team, you will combine your software and systems engineering expertise to run large-scale, distributed, fault-tolerant services. We welcome new perspectives and strategies to address evolving challenges to reliability. We collaborate with many Infrastructure and Product engineering teams at Slack to continuously improve shared technology and processes.

What you will be doing

  • Directly support multiple components of Webapp’s infrastructure, including monitoring and visibility automation, and other infrastructure tooling.
  • Define and build solutions to improve the reliability and resilience of our services.
  • Write code to automate maintenance and reduce the need for manual intervention.
  • Help define SLA/SLOs for Webapp infrastructure, manage code deployments, fixes and software updates, and automate our operational processes.
  • Have an operational responsibility in addition to being a software developer. You will participate in the team's on-call rotation, assist with triaging and addressing production issues, and respond to incidents.
  • Review code and get your code reviewed; mentor and be mentored by other engineers. Teamwork is what makes the dream work.

What you should have

  • Familiarity and experience with software development, including traditional operations and/or infrastructure tooling.
  • Experience managing critical production infrastructure, maintaining reliability and uptime, and having a customer first view of operational safety.
  • Experience with functional or imperative programming languages such as Ruby and Go.
  • Experience with Chef, Terraform, cloud infrastructure (ideally AWS), IAMs, Docker, Linux, and observability tools such as Logstash, Kibana, Prometheus, and Grafana.
  • Strong collaboration skills: collaborating is core to how we operate and this excites you! To us, this means working with other teams on cross functional projects as well as day-to-day collaboration.
  • Familiarity with operational metrics, experience with incident management and strong debugging skills.
  • Bachelor’s degree in Computer Science, Engineering or related field, or equivalent training or work experience.

Bonus Points

  • Experience as a Site Reliability Engineer (SRE), or as a platform or infrastructure engineer building and managing reliability mechanisms on distributed infrastructure.
  • Comfortable with deploying, operating and debugging distributed systems on Linux at scale.
  • Experience with AWS infrastructure at scale.
  • Experience with HHVM, mcrouter, and memcached.
  • Ability to dig deep across multiple layers of the stack, from networking and virtualization to configuration management and packaging.
  • Experience working within highly regulated environments where an understanding of FEDRAMP/NIST frameworks were essential.
  • Core Infrastructure is a diverse and inclusive team that treats their colleagues exceptionally well. We are happy to help you learn what you need to know; and we encourage and support each other’s growth and thus it’s not expected that you would have expertise across all of these areas. The team looks for people who are curious, inventive, and work to be a little better every single day. In our work together, we aim to be smart, humble, hardworking and, above all, collaborative. Come join us!

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Atlanta, Georgia, United States

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Devops Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

We're Salesforce, the Customer Company, inspiring the future of business with AI + Data + CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing wellanddoing good – you've come to the right place.

Atlanta, Georgia, United States (Hybrid)

Dublin, County Dublin, Ireland (Remote)

Bogota, Colombia (Remote)

Hyderabad, Telangana, India (Hybrid)

Mexico City, Mexico (Hybrid)

Dallas, Texas, United States (Hybrid)

Herndon, Virginia, United States (Remote)

Herndon, Virginia, United States (Hybrid)

Indianapolis, Indiana, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Salesforce

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug