Senior SRE engineer

3 Months ago • 5 Years +

Devops

Job Description

The Senior SRE Engineer will ensure reliability, availability, and performance of production systems, especially during weekends. They will troubleshoot and resolve critical issues in a fast-paced, high-availability environment. Responsibilities include automating processes, designing scalable infrastructure solutions, improving observability, leading post-incident reviews, and developing documentation. The role involves working with various teams to identify and resolve infrastructure issues, and setting up automation for alerting. The candidate should be able to communicate effectively in English. The job requires focus on monitoring alert channels, logs and infrastructure load for the entire stack.

Good To Have:

Exposure to Execution Management Systems (EMS) / Portfolio Management Systems (PMS).
Experience with client-impact triage.
Proficiency with Datadog or similar observability platforms.
Knowledge of serverless architectures.
Familiarity with RDBMS and NoSQL databases.
Prior experience in fintech or trading platforms.
Strong understanding of API integrations.
Excellent problem-solving and communication skills.
Experience working with client-facing teams.

Must Have:

5+ years of experience in SRE/DevOps.
Strong experience with AWS or GCP services.
Experience with automation tools like Terraform.
Proficiency in scripting languages (Python, Bash, Go).
Solid understanding of Linux systems and cloud architectures.
Experience with container orchestration (Kubernetes).
Proficient with CI/CD pipelines.
Ability to troubleshoot complex systems.
Comfortable working autonomously on weekend shifts.

Perks:

Flexible working format.
Competitive salary and compensation package.
Personalized career growth.
Professional development tools (mentorship, training).
Active tech communities with knowledge sharing.
Education reimbursement.
Anniversary presents.
Corporate events and team buildings.
Other location-specific benefits.

Add these skills to join the top 1% applicants for this job

communication

problem-solving

github

user-experience-ux

networking

linux

incident-response

aws

nosql

prometheus

terraform

grafana

elk

ci-cd

docker

kubernetes

python

monday

github-actions

bash

jenkins

N-iX is a global company with Ukrainian roots that helps businesses across the world develop successful software products. Founded in 2002, N-iX has come a long way and increased its presence in nine countries spanning Europe, the US, and Latin America. Today, we are a strong community of 2,000+ professionals and a reliable partner for global industry leaders and Fortune 500 companies.

About the Customer

A global UK-based fintech company that provides institutional-grade infrastructure for digital asset trading. Their platform supports execution management, liquidity access, and portfolio risk monitoring for hedge funds, banks, and asset managers, ensuring high availability and reliability of trading systems.

Technology Stack of the Customer

Cloud: AWS, GCP
Infrastructure: Terraform, Kubernetes, Docker
Monitoring: Datadog, CloudWatch, Prometheus
Languages: Python, Bash, Go
CI/CD: GitHub Actions, Jenkins
Observability: ELK stack, Grafana

Responsibilities

Ensure the reliability, availability, and performance of production systems, particularly during weekends.
Take ownership of monitoring, troubleshooting, and incident response during weekends and off-hours.
Actively participate in on-call rotations with priority weekend shifts (Saturday–Sunday).
Troubleshoot and resolve critical issues in a fast-paced, high-availability environment.
Automate manual processes and workflows, reducing operational overhead.
Work closely with engineering teams to design and deploy scalable, fault-tolerant infrastructure solutions on AWS or GCP.
Improve observability by utilizing monitoring, logging, and alerting systems (e.g., CloudWatch, Datadog).
Lead post-incident reviews, contribute to the continuous improvement of system reliability and follow up on strategic fixes.
Develop and update runbooks, incident response playbooks, and documentation.
Work closely with Engineering, Product, and Client teams to proactively identify infrastructure pain points that could affect the user experience.
Monitor alert channels, logs and infrastructure load for the entire stack.
Set up automation for alerting.
Ability to communicate effectively with both technical and non-technical stakeholders in English

Must-Have Skills

5+ years of experience in an SRE, DevOps, or infrastructure engineering role.
Strong experience with AWS or GCP, including services like EC2, Lambda, S3, RDS, and GKE (for GCP).
Experience with automation tools like Terraform.
Proficient in at least one scripting language (Python, Bash, Go, etc.).
Solid understanding of Linux systems, networking, and cloud-based architectures.
Experience working with container orchestration platforms like Kubernetes.
Proficient with CI/CD pipelines, preferably with cloud-native tools (e.g., GitHub).
Ability to troubleshoot complex, distributed systems and provide solutions in high-pressure environments.
Comfortable working autonomously during weekend shifts (Thursday–Monday schedule).

Nice-to-Have Skills

Exposure to Execution Management Systems (EMS) / Portfolio Management Systems (PMS).
Experience with client-impact triage, working cross-functionally with account managers or product teams.
Proficiency with Datadog or similar observability platforms.
Knowledge of serverless architectures (e.g., AWS Lambda, GCP Cloud Functions).
Familiarity with RDBMS and NoSQL databases, such as RDS, CloudSQL, DynamoDB.
Prior experience in fintech, trading platforms, or 24/7 financial infrastructure.
Strong understanding of API integrations and how infrastructure issues might manifest in client environments.
Excellent problem-solving and communication skills, with the ability to translate technical incidents into clear client updates.
Experience working with client-facing teams.

We offer*:

Flexible working format - remote, office-based or flexible
A competitive salary and good compensation package
Personalized career growth
Professional development tools (mentorship program, tech talks and trainings, centers of excellence, and more)
Active tech communities with regular knowledge sharing
Education reimbursement
Memorable anniversary presents
Corporate events and team buildings
Other location-specific benefits

*not applicable for freelancers

Set alerts for more jobs like Senior SRE engineer

Set alerts for new jobs by N-ix

Set alerts for new Devops jobs in India

Set alerts for new jobs in India

Set alerts for Devops (Remote) jobs