Site Reliability Engineer II - Reliability Engineering - Operations

1 Month ago • All levels
Devops

Job Description

We are seeking a passionate Site Reliability Engineer II to join our dynamic team, focusing on ensuring the reliability and performance of CME Group's trading and clearing server environment. The role involves collaborating with engineering teams to monitor, maintain, and troubleshoot systems, improve observability, and contribute to incident response. Key responsibilities also include automating tasks for scalability, leading technical discussions, and supporting migration to Google Cloud Platform.
Good To Have:
  • Knowledge of Google Cloud Platform, GKE, and/or GCE.
  • Basic knowledge of networking (HTTP/TCP/UDP/IP).
  • Experience with Docker & Kubernetes.
  • Experience in Financial markets.
  • Experience working in an agile environment.
Must Have:
  • Collaborate with engineering teams to monitor, maintain, and troubleshoot Markets systems.
  • Improve observability and alerting for faster issue detection and resolution.
  • Contribute to incident response and on-call rotation, resolving escalated incidents.
  • Take accountability for moderately-complex incidents, problems, and changes.
  • Lead technical discussions and present solutions.
  • Support migration of applications & infrastructure to Google Cloud Platform (GCP).
  • Automate tasks to enhance system scalability and reliability.
  • Collaborate with cross-functional teams to improve system performance and efficiency.
  • Define problems, describe cause-effect relationships, and gather data.
  • Act as a mentor to L2 and L1 colleagues.
  • Experience with Linux & Windows-based systems and Cloud-based platforms.
  • Knowledge of distributed systems.
  • Exposure to metrics & monitoring tools like Splunk, Grafana.
  • Experience working with Infrastructure as Code.
  • Competent scripting skills (Python, Bash, etc.).

Add these skills to join the top 1% applicants for this job

team-management
cross-functional
communication
talent-acquisition
game-texts
agile-development
networking
linux
incident-response
grafana
google-cloud-platform
docker
kubernetes
python
splunk
bash

We are seeking a passionate SRE to join our dynamic team.

The Site Reliability Engineer II - Reliability Engineering & Operations will help ensure the reliability and performance of our trading and clearing server environment.

The successful applicant must be able to solve problems creatively, communicate effectively, and work both independently and collaboratively.

Key Responsibilities:

  • Collaborate with systems engineers, SRE's and Product engineering teams to monitor, maintain and troubleshoot our Markets systems technology platform.
  • Collaborate with stakeholders to improve observability and alerting of our platform to enable data-driven business decisions, faster issue detection and incident resolution.
  • Contribute to incident response and on-call rotation - own and resolve escalated incidents, identifying and mitigating root causes associated with distributed computing architecture (client server, intranet/internet), h/w platforms and resources: CPU, memory, virtualization, clustering and Cloud computing.
  • Take accountability for delivery of moderately-complex incidents, problems and changes.
  • Lead technical discussions for own work and present solutions and approaches.
  • Support the migration of markets applications & infrastructure to Google Cloud Platform (GCP).
  • Assist in automating tasks to enhance system scalability and reliability.
  • Collaborate with cross-functional teams to improve system performance and efficiency
  • Defines problems and describes the cause and effect relationship; gathers and compares data about the problems with supervision and documents the details
  • Act as a mentor to L2 and L1 colleagues.

Skills:

  • Experience with Linux & Windows-based systems and Cloud-based platform(s).
  • Strong problem-solving and analytical abilities.
  • Excellent Effective communication and teamwork skills.
  • Experience and knowledge of working with distributed systems.
  • Exposure to working with metrics & monitoring: Splunk, Grafana etc.
  • Experience working with Infrastructure as Code.
  • Competent scripting skills (Python, Bash, etc.).
  • Eagerness to learn and adapt in a fast-paced trading environment.

Desirable:

  • Knowledge of Google Cloud Platform, GKE, and/or GCE.
  • Basic knowledge of networking (HTTP/TCP/UDP/IP).
  • Experience with Docker & Kubernetes
  • Experience in Financial markets.
  • Experience working in an agile environment.

CME Group: Where Futures are Made

CME Group is the world’s leading derivatives marketplace. But who we are goes deeper than that. Here, you can impact markets worldwide. Transform industries. And build a career by shaping tomorrow. We invest in your success and you own it – all while working alongside a team of leading experts who inspire you in ways big and small. Problem solvers, difference makers, trailblazers. Those are our people. And we’re looking for more.

At CME Group, we embrace our employees' unique experiences and skills to ensure that everyone’s perspectives are acknowledged and valued. As an equal-opportunity employer, we consider all potential employees without regard to any protected characteristic.

Important Notice: Recruitment fraud is on the rise, with scammers using misleading promises of job offers and interviews to solicit money and personal information from job seekers. CME Group adheres to established procedures designed to maintain trust, confidence and security throughout our recruitment process. Learn more here.

Set alerts for more jobs like Site Reliability Engineer II - Reliability Engineering - Operations
Set alerts for new jobs by CME Group
Set alerts for new Devops jobs in India
Set alerts for new jobs in India
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙