Site Reliability Engineer (SRE) - grok.com & API
xAI
Job Summary
xAI's mission is to create AI systems that understand the universe and aid humanity. The team is small, highly motivated, and focused on engineering excellence, operating with a flat structure where all employees contribute directly. This SRE role is on the backend services team for grok.com and the API, primarily based in London with a growing Palo Alto presence. The team focuses on scalable, reliable services processing tens of thousands of queries per second on Kubernetes clusters. Ideal candidates possess expert knowledge in Kubernetes, continuous deployment systems like Buildkite and ArgoCD, monitoring technologies such as Prometheus, Grafana, PagerDuty, and infrastructure as code like Pulumi or Terraform.
Must Have
- Expert knowledge of Kubernetes
- Expert knowledge of continuous deployment systems such as Buildkite and ArgoCD
- Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty
- Expert knowledge of infrastructure as code technologies such as Pulumi or Terraform
Perks & Benefits
- Equity
- Comprehensive medical coverage
- Vision coverage
- Dental coverage
- 401(k) retirement plan
- Short-term disability insurance
- Long-term disability insurance
- Life insurance
- Various other discounts and perks
Job Description
About xAI
xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All engineers are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.
About the team
You will work on the team that is responsible for the backend services that power grok.com and our API. Our team is currently based primarily in London with a small but growing number of engineers located in Palo Alto. We focus on writing highly scalable and reliable services that can efficiently process tens of thousands of queries per second. The services are hosted on a number of Kubernetes clusters (on-prem & cloud).
About the role
An ideal candidate meets at least the following requirements:
1. Expert knowledge of Kubernetes,
2. Expert knowledge of continuous deployment systems such as Buildkite and ArgoCD,
3. Expert knowledge of monitoring technologies such as Prometheus, Grafana, and PagerDuty,
4. Expert knowledge of infrastructure as code technologies such as Pulumi or Terraform.
Location
We hire engineers in London and in Palo Alto. We usually work from the office 5 days a week but allow for work-from-home days when required. Candidates joining the London team must be willing to attend late meetings at least once a week to coordinate with the rest of our team.
Interview process
After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview (“phone interview”) during which a member of our team will ask some basic technical questions. If you clear the initial phone interview, you will enter the main process, which consists of two technical interviews.
All interviews will be conducted via Google Meet.
Annual Salary Range
$180,000 - $440,000 USD
Benefits
Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.
xAI is an equal opportunity employer.