SRE-II

6 Months ago • 3 Years +
Devops

Job Description

As an SRE II, you will play a key role in ensuring the reliability, performance, and scalability of mission-critical systems. You will work with engineering teams to design, implement, and maintain infrastructure for high-volume data-intensive applications. Responsibilities include maintaining system reliability, designing and enhancing monitoring solutions, collaborating with development teams, managing distributed systems in a Linux-based environment, leveraging AWS cloud services, utilizing Kubernetes, implementing CI/CD pipelines, using infrastructure as code, implementing observability best practices, performing root cause analysis, and ensuring security best practices.
Must Have:
  • 3+ years of SRE experience.
  • Strong Linux fundamentals.
  • Expertise in troubleshooting and problem-solving.
  • Experience with logging and monitoring solutions.
  • Proficiency in at least one programming language (Python preferred).
  • Experience with AWS and Kubernetes.
  • Exposure to CI/CD pipelines.
  • Experience with infrastructure as code.
  • Familiarity with observability tools.

Add these skills to join the top 1% applicants for this job

problem-solving
talent-acquisition
quality-control
gitlab
linux
aws
prometheus
terraform
grafana
ci-cd
kubernetes
python

Who we are

Mindtickle is the market-leading revenue productivity platform that combines on-the-job learning and deal execution to get more revenue per rep. Mindtickle is recognized as a market leader by top industry analysts and is ranked by G2 as the #1 sales onboarding and training product. We’re honoured to be recognized as a Leader in the first-ever Forrester Wave™: Revenue Enablement Platforms, Q3 2024!

Job Snapshot

As an SRE II, you will play a key role in ensuring our mission-critical systems' reliability, performance, and scalability. You will work closely with engineering teams to design, implement, and maintain infrastructure that supports high-volume data-intensive applications. Your expertise in monitoring, troubleshooting, and automation will drive operational excellence across our distributed environment.

What’s in it for you?

    • Maintain and improve the reliability, availability, and performance of high-volume, data-intensive applications.
    • Design, implement, and enhance monitoring, logging, and alerting solutions at scale.
    • Collaborate with development teams to optimize system architecture and reliability.
    • Manage and troubleshoot distributed systems in a Linux-based production environment.
    • Leverage AWS cloud services to scale infrastructure efficiently.
    • Utilize Kubernetes for container orchestration, ensuring optimal resource utilization and deployment strategies.
    • Implement CI/CD pipelines using GitLab to automate deployments and operational tasks.
    • Use infrastructure as code (IaC) tools such as Terraform and CloudFormation for provisioning and managing cloud resources.
    • Implement observability best practices using Grafana, Prometheus, Thanos, and Loki.
    • Perform root cause analysis (RCA) and proactively address performance bottlenecks and system failures.
    • Ensure security best practices and compliance across all infrastructure components.

We’d love to hear from you, if you:

    • Have 3+ years of experience in Site Reliability Engineering or related fields.
    • Possesses strong Linux fundamentals with a deep understanding of system internals.
    • Expertise in troubleshooting and problem-solving in distributed environments.
    • Have hands-on experience with logging and monitoring solutions at scale.
    • Are proficient in at least one programming language (preferably Python).
    • Have strong experience with AWS services and Kubernetes.
    • Have exposure to CI/CD pipelines, preferably using GitLab CI/CD.
    • Have experience with infrastructure as code (Terraform, CloudFormation).
    • Are familiar with observability tools such as Grafana, Prometheus, Thanos, and Loki.

Preferred Qualifications

    • Experience in performance tuning and capacity planning.
    • Knowledge of incident management and post-mortem analysis processes.
    • Familiarity with security best practices in cloud environments.
    • Experience in automating operational tasks using scripting and configuration management tools.

Our culture & accolades

As an organization, it’s our priority to create a highly engaging and rewarding workplace. We offer tons of awesome perks and many opportunities for growth.

Our culture reflects our employee's globally diverse backgrounds along with our commitment to our customers, and each other, and a passion for excellence. We live up to our values, DAB, Delight your customers, Act as a Founder, and Better Together.

Mindtickle is proud to be an Equal Opportunity Employer.

All qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, national origin, disability, protected veteran status, or any other characteristic protected by law.

Your Right to Work - In compliance with applicable laws, all persons hired will be required to verify identity and eligibility to work in the respective work locations and to complete the required employment eligibility verification document form upon hire.

Set alerts for more jobs like SRE-II
Set alerts for new jobs by Mindtickle
Set alerts for new Devops jobs in India
Set alerts for new jobs in India
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙