System Reliability Engineer

1 Day ago • All levels

Job Summary

Job Description

As a System Reliability Engineer, you will manage AWS environments, implement automation with Python and boto3, and maintain monitoring and security practices. Responsibilities include managing AWS, Linux, and Windows Server environments, implementing monitoring tools, developing automation scripts, performing patch management, using infrastructure as code tools, analyzing system logs, collaborating on dashboards, enforcing security best practices, providing technical guidance, and creating detailed documentation. You will continuously identify and implement opportunities for process improvement to enhance system reliability and performance.
Must have:
  • Proficiency in Python with boto3 for AWS automation
  • Strong experience managing AWS services including EC2, Lambda, S3, and Systems Manager
  • Familiarity with monitoring solutions
  • Experience with infrastructure as code tools
  • Hands-on experience with Linux and Windows Server management
  • Experience with version control systems like Git
  • Knowledge of automation tools such as Ansible, Puppet, or Chef
  • Proficiency in scripting languages like Bash and PowerShell
  • Understanding of patch management, vulnerability assessment, and security best practices
  • Experience with incident response, troubleshooting, and root cause analysis (RCA)
Good to have:
  • Familiarity with AWS Well-Architected principles
  • Experience with Control Tower and cross-account IAM permissions
  • AWS Certifications: DevOps Engineer: Professional, Solutions Architect: Associate, SysOps Administrator: Associate, Developer: Associate
Perks:
  • Remote Work Opportunities
  • Flexible Work Hours

Job Details

About us:

Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients.  Our founders and team members have industry experience and have held senior positions in a wide variety of companies – from emerging startups to large Fortune 50 firms – and we have taken our combined experiences and developed a unique approach that is supported by the principles of deep expertise, integrity, transparency, and dependability.

The Role:

As a System Reliability Engineer, you will play a critical role in managing AWS environments, implementing automation with Python and boto3, and maintaining robust monitoring and security practices. Your expertise in AWS Systems Manager, including Run Command and Patch Manager, will ensure high availability and performance across systems. If you thrive on solving complex problems, optimizing cloud infrastructures, and working with cutting-edge AWS technologies, we want to hear from you!

Responsibilities:

  • Manage, configure, and maintain AWS, Linux (e.g., Amazon Linux, CentOS), and Windows Server environments.
  • Implement and maintain monitoring tools such as AWS CloudWatch, Dynatrace, or Datadog to track performance and ensure reliability.
  • Develop and manage automation scripts using Python and boto3 for AWS operations, including EC2, S3, and Lambda management.
  • Perform regular patch management, vulnerability assessments, and remediation using AWS Systems Manager (Run Command, Patch Manager).
  • Use infrastructure as code (IaC) tools like AWS CloudFormation or Terraform for automated and repeatable infrastructure setups.
  • Analyze system logs and performance metrics to proactively identify and resolve issues.
  • Collaborate with stakeholders to create dashboards and alerts for proactive performance monitoring.
  • Enforce security best practices and manage IAM roles, cross-account permissions, and secure access policies.
  • Provide technical guidance and support for resolving complex system issues and conducting root cause analysis (RCA).
  • Create and maintain detailed documentation for system configurations, processes, and incident reports.
  • Continuously identify and implement opportunities for process improvement to enhance system reliability and performance.

Required Skills:

  • Proficiency in Python, with hands-on experience using boto3 for AWS automation.
  • Strong experience managing AWS services, including EC2, Lambda, S3, and Systems Manager (Run Command, Patch Manager).
  • Familiarity with monitoring solutions such as AWS CloudWatch, Dynatrace, or Datadog.
  • Experience with infrastructure as code (IaC) tools like AWS CloudFormation or Terraform.
  • Hands-on experience with Linux (Amazon Linux, CentOS) and Windows Server management.
  • Experience with version control systems like Git.
  • Knowledge of automation tools such as Ansible, Puppet, or Chef.
  • Proficiency in scripting languages like Bash and PowerShell.
  • Solid understanding of patch management, vulnerability assessment, and security best practices.
  • Experience with incident response, troubleshooting, and root cause analysis (RCA).

Nice to Have:

  • Familiarity with AWS Well-Architected principles.
  • Experience with Control Tower and cross-account IAM permissions.
  • AWS Certifications:
    • DevOps Engineer: Professional
    • Solutions Architect: Associate
    • SysOps Administrator: Associate
    • Developer: Associate

What We Offer:

  • Remote Work Opportunities
  • Flexible Work Hours

Tech Holding is proud to be an Equal Opportunity Employer and is committed to fostering a diverse and inclusive workplace. We welcome applicants from all backgrounds and experiences, and we consider qualified applicants without regard to race, color, religion, gender, sexual orientation, gender identity, national origin, disability, veteran status, or any other legally protected characteristic. If you require accommodation in the application process, please contact our HR 

Similar Jobs

The Walt Disney Company - Senior Systems Administrator

The Walt Disney Company

Charlotte, North Carolina, United States (On-Site)
2 Weeks ago
PENN Interactive - DevOps Engineer

PENN Interactive

(Remote)
1 Day ago
Comscore - Quality Assurance Engineer II

Comscore

Pune, Maharashtra, India (On-Site)
23 Hours ago
Garena - Site Reliability Engineer/Senior Site Reliability Engineer

Garena

Singapore (On-Site)
2 Weeks ago
Equivalent Jobs - PYTHON DEVELOPER

Equivalent Jobs

(Remote)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Playrix - Senior Release Engineer

Playrix

Georgia (Remote)
6 Months ago
Palo Alto Networks - Senior Staff DevOps Engineer (Prisma SaaS) - NetSec

Palo Alto Networks

Gurugram, Haryana, India (On-Site)
1 Month ago
Wargaming - Infrastructure Engineer

Wargaming

Warsaw, Masovian Voivodeship, Poland (Hybrid)
1 Week ago
Spell Brush - Software Engineer

Spell Brush

San Francisco, California, United States (On-Site)
1 Month ago
Rackspace Technology - Cloud Practice Engineer

Rackspace Technology

Bengaluru, Karnataka, India (Hybrid)
6 Months ago
Axon - Senior Security Engineer

Axon

San Francisco, California, United States (Hybrid)
8 Hours ago
Gitlab - Support Engineer (AMER)

Gitlab

(Remote)
8 Hours ago
Google - CPU Design Verification Engineer

Google

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
1 Week ago
Airbyte - Technical Account Manager

Airbyte

San Francisco, California, United States (On-Site)
1 Day ago
NVIDIA - Software Test Developer Intern - Spark Rapids, Big Data & Deep Learning - 2025

NVIDIA

Shanghai, Shanghai, China (On-Site)
4 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Worldwide

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!