Senior Site Reliability Engineer

1 Hour ago • 8 Years + • $155,000 PA - $250,000 PA

Job Summary

Job Description

As a Senior Site Reliability Engineer (SRE) at Glean, you will play a crucial role in ensuring the reliability, availability, and performance of cloud-based services and applications. Your responsibilities will include designing, building, and maintaining robust, scalable, and highly available cloud infrastructure. You will lead technical excellence, set best practices for incident management, performance optimization, and automation, while collaborating with engineering teams. You will also participate in incident management, optimize the on-call process, develop automation scripts and tools, and optimize infrastructure for performance and cost-effectiveness. Furthermore, you will collaborate on security, implement monitoring systems, and engage in the software development lifecycle by providing SRE insights.
Must have:
  • 8+ years of experience in SRE or similar role
  • 5+ years of software development experience
  • 2+ years of experience managing people or teams
  • Strong knowledge of cloud platforms such as Google Cloud Platform, AWS, or Azure
  • Experience with containerization technologies, including Docker and Kubernetes
Perks:
  • Competitive compensation
  • Medical, Vision and Dental coverage
  • Flexible work environment and time-off policy
  • 401k
  • Company events
  • A home office improvement stipend when you first join
  • Annual education stipend
  • Wellness stipend
  • Healthy lunches and dinners provided daily

Job Details

About Glean

We’re on a mission to make knowledge work faster and more humane. We believe that AI will fundamentally transform how people work. In the future, everyone will work in tandem with expert AI assistants who find knowledge, create and synthesize information, and execute work. These assistants will free people up to focus on the higher-level, creative aspects of their work.

We’re building a system of intelligence for every company in the world. On the surface, you can think of it as Google + ChatGPT for the enterprise. Under the hood, our platform is the connective tissue between AI and knowledge. It brings all of a company’s knowledge together, understands it at a deep level, provides industry-leading search relevance over it, and connects it to generative AI agents and applications.

Glean was founded by a seasoned team of former Google search and Facebook engineers who saw a need in the enterprise space for their technical depth and passion for AI. We’re a diverse team of curious and creative people who want to help each other get big things done—so we can help other teams do the same. 

We're backed by some of the Valley's leading venture capitalists—including Sequoia, Kleiner Perkins, Lightspeed, and General Catalyst—and have assembled a world-class team with senior leadership experience at Google, Slack, Facebook, Dropbox, Rubrik, Uber, Intercom, Pinterest, Palantir, and others.

Role

We are seeking a skilled and motivated Senior Site Reliability Engineer (SRE) to become a valuable addition to our dynamic and innovative team. As a SRE, you will play a critical role in ensuring the reliability, availability, and performance of our cloud-based services and applications. You will work closely with our engineering teams to design, build, and maintain robust, scalable, and highly available cloud infrastructure.

Much of our software development focuses on building infrastructure to scale our operations in a hybrid cloud environment and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale and fast growth which are unique to Glean, while using your expertise in coding, algorithms, problem-solving, and SRE practices. We keep Glean applications up and running, ensuring our customers have the best and most reliable experience possible.
 

What will you do and achieve:

  • Technical Leadership and Mentorship: Play a key role in driving technical excellence and fostering a culture of reliability across engineering teams. You will lead by example, setting best practices for incident management, performance optimization, and automation. Influence best practices, drive cross-team collaborations, and contribute to the execution of key objectives in alignment with engineering leadership and cross-functional partners. Establish strong technical credibility, shaping architectural decisions and ensuring the delivery of high-quality, reliable systems.
  • Ensure High Availability: Implement and maintain resilient cloud architectures, monitor system performance, and proactively identify and resolve potential bottlenecks or points of failure. 
  • Incident Management: Participate in primary oncall rotation; cultivate technical curiosity and growth mindset, and a blameless postmortem culture within the team. Continuously optimize the on-call process for sustainability and efficiency.
  • Automation and Tooling: Develop and maintain automation scripts, tools, and processes to streamline system deployment, monitoring, and management tasks. Your contributions will be vital in efficiently scaling cloud operations.
  • Performance Optimization: Optimize cloud infrastructure and applications for performance, scalability, and cost-effectiveness.
  • Security and Compliance: Collaborate with security engineers to implement best practices and ensure compliance with security standards and policies.
  • Monitoring and Alerting: Design and configure advanced monitoring systems to gain insights into system behavior, set up alerts, and respond proactively to potential issues. Create and maintain comprehensive dashboards and playbooks for production on-call.
  • Software Development Consultation: Engage actively in the entire software development lifecycle. Participate in system design reviews and provide valuable SRE insights during launch reviews, influencing and enhancing system architecture.

Who you are:

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • 8+ years of experience in a senior-level role within Site Reliability Engineering or similar role, particularly in managing cloud-based services and infrastructure.
  • 5+ years of experience with software development in one or more programming languages.
  • 2+ years of experience managing people or teams, leading projects, and designing, analyzing, and troubleshooting distributed systems running in Cloud.
  • Strong knowledge of cloud platforms such as Google Cloud Platform, AWS, or Azure.
  • Practical experience with containerization technologies, including Docker and Kubernetes. Familiarity with infrastructure as code tools like Terraform is essential.
  • Solid understanding of networking, security principles, and best SRE and security practices.
  • Proficiency in using monitoring and alerting tools to detect and respond to potential issues effectively

Benefits

  • Competitive compensation
  • Medical, Vision and Dental coverage
  • Flexible work environment and time-off policy
  • 401k
  • Company events
  • A home office improvement stipend when you first join
  • Annual education stipend
  • Wellness stipend
  • Healthy lunches and dinners provided daily

For California based applicants: 

The standard base salary range for this position is $155,000 - $250,000 annually. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for variable compensation, equity, and benefits.

We are a diverse bunch of people and we want to continue to attract and retain a diverse range of people into our organization. We're committed to an inclusive and diverse company. We do not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.

Similar Jobs

Fictiv - Associate Customer Project Specialist, Manufacturing

Fictiv

Bengaluru, Karnataka, India (Hybrid)
1 Week ago
Western Digital - Data Scientist

Western Digital

Prachin Buri, Thailand (On-Site)
1 Month ago
grendel games - Serious game programming intern

grendel games

Leeuwarden, Friesland, Netherlands (Hybrid)
4 Days ago
Google - Software Engineer III, Security Endpoint Agents, Core

Google

Dublin, County Dublin, Ireland (On-Site)
4 Weeks ago
Ubisoft - Junior Programmer

Ubisoft

Chengdu, Sichuan, China (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Argus Labs - Sr. Software Engineer (Infrastructure/Backend)

Argus Labs

Indonesia (Remote)
1 Month ago
Google - Software Engineer, NetSoft

Google

Sydney, New South Wales, Australia (On-Site)
3 Weeks ago
AppZen - Python Developer Lead/Manager

AppZen

Pune, Maharashtra, India (On-Site)
2 Weeks ago
Sleeper - Senior Frontend Engineer (Mobile)

Sleeper

Las Vegas, Nevada, United States (On-Site)
1 Month ago
QUANTIC DREAM - Data Analyst (F/M/NB)

QUANTIC DREAM

Paris, Île-de-France, France (Hybrid)
6 Months ago
Google - Software Developer III, Google Cloud Platforms

Google

Waterloo, Ontario, Canada (On-Site)
3 Weeks ago
Moloco - Backend Software Engineer II

Moloco

Seoul, South Korea (On-Site)
2 Weeks ago
Workato - Senior Infrastructure Engineer (OpenSearch)

Workato

Nicosia, Nicosia, Cyprus (On-Site)
2 Weeks ago
Google - Software Engineer, Early Career, Campus

Google

Bucharest, Bucharest, Romania (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Palo Alto, California, United States

Alphasense - Implementation Consultant

Alphasense

New York, New York, United States (On-Site)
2 Weeks ago
Sbm management - Custodial Lead

Sbm management

Richmond, Virginia, United States (On-Site)
3 Months ago
Nintendo - Associate Social Media Specialist

Nintendo

Redmond, Washington, United States (Hybrid)
6 Months ago
Redhorse Corp - National and Compartmented Programs (NCP) Subject Matter Expert (SME)

Redhorse Corp

Arlington, Virginia, United States (On-Site)
2 Weeks ago
Patel greene - Senior Roadway Engineer

Patel greene

Tallahassee, Florida, United States (On-Site)
6 Months ago
Google - Software Engineer III, Google Cloud Platforms

Google

Kirkland, Washington, United States (On-Site)
6 Months ago
ByteDance - Senior Data Engineer, Global E-Commerce Governance Platform

ByteDance

San Jose, California, United States (On-Site)
2 Months ago
ByteDance - Backend Engineer(Distributed System) - Network Security - San Jose

ByteDance

San Jose, California, United States (On-Site)
6 Months ago
Google - Security Consultant Developer

Google

Atlanta, Georgia, United States (On-Site)
3 Weeks ago
Tag - Account Director

Tag

New York, United States (Hybrid)
5 Days ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Palo Alto, California, United States (Hybrid)

Bengaluru, Karnataka, India (On-Site)

Bengaluru, Karnataka, India (On-Site)

Palo Alto, California, United States (Hybrid)

Palo Alto, California, United States (On-Site)

Palo Alto, California, United States (Hybrid)

Bengaluru, Karnataka, India (On-Site)

Palo Alto, California, United States (Hybrid)

Bengaluru, Karnataka, India (On-Site)

Palo Alto, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Glean

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug