Senior Engineer, Site Reliability Engineer

10 Minutes ago • 8 Years +

Devops

Job Description

LSEG is seeking Site Reliability Engineers (SREs) to support its Real Time application space, providing global clients with trading and market data. The role involves mitigating operational risk, driving cloud migrations, automating tasks, creating observability components, and developing subject matter expertise. This globally distributed team offers a fast-paced environment to hone skills in real-time data processing.

Good To Have:

Experience in real-time or low-latency environments.
Experience with market data systems.
DataDog and BigPanda experience is highly desirable.
Customer support experience.

Must Have:

Lead Incident Recovery, including troubleshooting and coordinating actions.
Write detailed incident reports and advocate for customers in post-incident reviews.
Ensure a stable production environment by safely delivering changes and assessing deployment risks.
Automate operational processes to reduce manual work and eliminate unnecessary alerts.
Develop expertise in application dataflows, networking topologies, and troubleshooting knowledge bases.
Collaborate with development teams to prioritize iterative improvements to production environments.
Serve as SRE point of contact for new projects, ensuring application designs meet supportability standards.
Extensive experience in UNIX administration and scripting, including shell scripting and automation.
Practical experience supporting cloud-native applications.
Advanced understanding of networking concepts such as TCP/IP, HTTP, and DNS resolution.
Experience in configuring and using Kubernetes, Docker, and container-based applications.
Proven experience in troubleshooting large distributed systems.
Expertise in working with and maintaining observability tooling.
Strong grasp of version control systems, particularly Git.
A bachelor’s degree in computer science or a related technical field, or equivalent practical experience.
A minimum of 8 years of work experience in the industry.

Perks:

Healthcare
Retirement planning
Paid volunteering days
Wellbeing initiatives

Add these skills to join the top 1% applicants for this job

problem-solving

github

game-texts

networking

dns

aws

unix

azure

docker

kubernetes

git

shell

We are hiring Site Reliability Engineers (SREs) to expand our team supporting LSEG’s Real Time application space. This space provides thousands of clients with access to trading and market data globally. Our team efficiently mitigates operational risk, enabling fast product innovation and enhancing customer experience.

As an SRE, you will own operational support, drive cloud migrations, and automate operational tasks. You will also create dynamic observability components and develop subject matter expertise. Join our diverse, globally distributed team of multi-faceted technical professionals. You will have the opportunity to hone your skills in a fast-paced environment that processes real-time data.

Key Responsibilities:

Lead Incident Recovery: Direct the recovery of incidents, analyze facts quickly, perform troubleshooting activities, and coordinate actions through incident recovery meetings.
Incident Reporting and Customer Advocacy: Write detailed incident reports, advocate for customers in post-incident reviews, and review and approve customer statements.
Production Environment Stability: Ensure a stable production environment by safely delivering changes and thoroughly assessing deployment risks.
Operational Process Improvement: Automate operational processes to reduce manual work. Ensure all alerts are actionable and collaborate with development teams to eliminate unnecessary alerts.
Subject Matter Expertise: Develop expertise in application dataflows and networking topologies and maintain comprehensive troubleshooting knowledge bases.
Collaboration with Development Teams: Collaborate closely with development teams to prioritize iterative improvements to production environments in the product backlog.
Project Intake and Delivery: Serve as the SRE point of contact for new projects. Collaborate with project delivery teams to produce high-quality project artifacts, ensure application designs meet supportability standards, conduct operational acceptance testing, and lead Game Day activities.

Technical qualifications:

Extensive experience in UNIX administration and scripting, including shell scripting and automation.
Practical experience supporting cloud-native applications, with a preference for AWS or Azure.
Advanced understanding of networking concepts such as TCP/IP, HTTP, and DNS resolution.
Experience in real-time or low-latency environments a plus.
Experience in configuring and using Kubernetes, Docker, and container-based development and applications.
Proven experience in troubleshooting large distributed systems; experience with market data systems is a plus.
Expertise in working with and maintaining observability tooling (DataDog and BigPanda experience is highly desirable)
Strong grasp of version control systems, particularly Git.
A bachelor’s degree in computer science or a related technical field involving software/systems engineering, or equivalent practical experience.
A minimum of 8 years of work experience in the industry, with customer support experience being a plus.