System Reliability Engineer

1 Month ago • All levels

Job Summary

Job Description

As a System Reliability Engineer, you will manage AWS environments, implement automation with Python and boto3, and maintain monitoring and security practices. Responsibilities include managing AWS, Linux, and Windows Server environments, implementing monitoring tools, developing automation scripts, performing patch management, using infrastructure as code tools, analyzing system logs, collaborating on dashboards, enforcing security best practices, providing technical guidance, and creating detailed documentation. You will continuously identify and implement opportunities for process improvement to enhance system reliability and performance.
Must have:
  • Proficiency in Python with boto3 for AWS automation
  • Strong experience managing AWS services including EC2, Lambda, S3, and Systems Manager
  • Familiarity with monitoring solutions
  • Experience with infrastructure as code tools
  • Hands-on experience with Linux and Windows Server management
  • Experience with version control systems like Git
  • Knowledge of automation tools such as Ansible, Puppet, or Chef
  • Proficiency in scripting languages like Bash and PowerShell
  • Understanding of patch management, vulnerability assessment, and security best practices
  • Experience with incident response, troubleshooting, and root cause analysis (RCA)
Good to have:
  • Familiarity with AWS Well-Architected principles
  • Experience with Control Tower and cross-account IAM permissions
  • AWS Certifications: DevOps Engineer: Professional, Solutions Architect: Associate, SysOps Administrator: Associate, Developer: Associate
Perks:
  • Remote Work Opportunities
  • Flexible Work Hours

Job Details

About us:

Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients.  Our founders and team members have industry experience and have held senior positions in a wide variety of companies – from emerging startups to large Fortune 50 firms – and we have taken our combined experiences and developed a unique approach that is supported by the principles of deep expertise, integrity, transparency, and dependability.

The Role:

As a System Reliability Engineer, you will play a critical role in managing AWS environments, implementing automation with Python and boto3, and maintaining robust monitoring and security practices. Your expertise in AWS Systems Manager, including Run Command and Patch Manager, will ensure high availability and performance across systems. If you thrive on solving complex problems, optimizing cloud infrastructures, and working with cutting-edge AWS technologies, we want to hear from you!

Responsibilities:

  • Manage, configure, and maintain AWS, Linux (e.g., Amazon Linux, CentOS), and Windows Server environments.
  • Implement and maintain monitoring tools such as AWS CloudWatch, Dynatrace, or Datadog to track performance and ensure reliability.
  • Develop and manage automation scripts using Python and boto3 for AWS operations, including EC2, S3, and Lambda management.
  • Perform regular patch management, vulnerability assessments, and remediation using AWS Systems Manager (Run Command, Patch Manager).
  • Use infrastructure as code (IaC) tools like AWS CloudFormation or Terraform for automated and repeatable infrastructure setups.
  • Analyze system logs and performance metrics to proactively identify and resolve issues.
  • Collaborate with stakeholders to create dashboards and alerts for proactive performance monitoring.
  • Enforce security best practices and manage IAM roles, cross-account permissions, and secure access policies.
  • Provide technical guidance and support for resolving complex system issues and conducting root cause analysis (RCA).
  • Create and maintain detailed documentation for system configurations, processes, and incident reports.
  • Continuously identify and implement opportunities for process improvement to enhance system reliability and performance.

Required Skills:

  • Proficiency in Python, with hands-on experience using boto3 for AWS automation.
  • Strong experience managing AWS services, including EC2, Lambda, S3, and Systems Manager (Run Command, Patch Manager).
  • Familiarity with monitoring solutions such as AWS CloudWatch, Dynatrace, or Datadog.
  • Experience with infrastructure as code (IaC) tools like AWS CloudFormation or Terraform.
  • Hands-on experience with Linux (Amazon Linux, CentOS) and Windows Server management.
  • Experience with version control systems like Git.
  • Knowledge of automation tools such as Ansible, Puppet, or Chef.
  • Proficiency in scripting languages like Bash and PowerShell.
  • Solid understanding of patch management, vulnerability assessment, and security best practices.
  • Experience with incident response, troubleshooting, and root cause analysis (RCA).

Nice to Have:

  • Familiarity with AWS Well-Architected principles.
  • Experience with Control Tower and cross-account IAM permissions.
  • AWS Certifications:
    • DevOps Engineer: Professional
    • Solutions Architect: Associate
    • SysOps Administrator: Associate
    • Developer: Associate

What We Offer:

  • Remote Work Opportunities
  • Flexible Work Hours

Tech Holding is proud to be an Equal Opportunity Employer and is committed to fostering a diverse and inclusive workplace. We welcome applicants from all backgrounds and experiences, and we consider qualified applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, disability, veteran status, or any other legally protected characteristic. If you require accommodation in the application process, please contact our HR 

Similar Jobs

NVIDIA - Senior Site Reliability Engineer - AI Research Clusters

NVIDIA

Austin, Texas, United States (Hybrid)
3 Months ago
Palo Alto Networks - Sr Staff DevOps Engineer (Cortex)

Palo Alto Networks

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Month ago
Telastra - Staff Engineer - Platform Engineering Security Specialist

Telastra

Australia (On-Site)
1 Month ago
Boomi - Senior Systems Development Engineer

Boomi

Bengaluru, Karnataka, India (On-Site)
1 Month ago
CGS Carrers - Billing System Analyst II

CGS Carrers

Bengaluru, Karnataka, India (Remote)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Granicus - Senior Site Reliability Engineer

Granicus

Bengaluru, Karnataka, India (Remote)
1 Year ago
LTI Mindtree - Specialist - Architecture

LTI Mindtree

Mexico (On-Site)
3 Weeks ago
Gaming Innovation Group  - Infrastructure Engineer

Gaming Innovation Group

Community Of Madrid, Spain (Hybrid)
2 Months ago
Fortra - Senior CloudOps Engineer

Fortra

Eden Prairie, Minnesota, United States (On-Site)
2 Weeks ago
JMA - Principal Firmware Engineer - Radio

JMA

New Providence, New Jersey, United States (On-Site)
3 Weeks ago
Capgemini - Site Reliability Engineer

Capgemini

Pune, Maharashtra, India (On-Site)
1 Month ago
Velotio Technologies - Senior DevOps Engineer (GCP)

Velotio Technologies

Maharashtra, India (Remote)
2 Months ago
Google - Product Engineer, Cloud Compute and Storage

Google

Atlanta, Georgia, United States (On-Site)
1 Month ago
Zscaler - Senior Devops Engineer (Terraform/Security Solutions)

Zscaler

Bengaluru, Karnataka, India (Hybrid)
2 Weeks ago
London stock Exchange - Associate DevOps Engineer

London stock Exchange

Colombo, Western Province, Sri Lanka (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Mexico

LTI Mindtree - TypeScript CSS and modern JavaScript ES6

LTI Mindtree

Mexico (On-Site)
2 Weeks ago
Springer Group - Telemarketing

Springer Group

Mexico City, Mexico (On-Site)
2 Weeks ago
Google - Senior Technical Recruiter (English, Spanish)

Google

Mexico City, Mexico City, Mexico (On-Site)
1 Month ago
Nubank - Treasury & ALM Specialist

Nubank

Mexico City, Mexico (On-Site)
2 Weeks ago
luxsoft - Murex Support Engineer

luxsoft

Mexico (Remote)
2 Weeks ago
plana technologies - 2D / 3D Artist

plana technologies

Mexico City, Mexico (Remote)
1 Month ago
Aptive - Quality Technician

Aptive

Ciudad Victoria, Tamaulipas, Mexico (On-Site)
2 Weeks ago
Nagarro - Staff Engineer, Java Fullstack

Nagarro

Mexico (Remote)
7 Months ago
Netflix - Manager, Production Finance Mexico

Netflix

Mexico City, Mexico City, Mexico (On-Site)
1 Month ago
Hogarth - Senior CGI Artist

Hogarth

Mexico City, Mexico (Hybrid)
3 Weeks ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Ahmedabad, Gujarat, India (On-Site)

Pune, Maharashtra, India (On-Site)

Mexico (Remote)

Santiago De Querétaro, Querétaro, Mexico (On-Site)

New York, New York, United States (Hybrid)

New York, New York, United States (Hybrid)

United States (Remote)

View All Jobs

Get notified when new jobs are added by techholding

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug