Site Reliability Engineer II

4 Months ago • 3 Years +

Job Description

The Site Reliability Engineer (SRE) will be responsible for the full system lifecycle including infrastructure provisioning, system configuration, monitoring, and incident response in production environments. They will work closely with development teams, operations teams, network engineers, database administrators, technology vendors, and partners to ensure application performance and availability. The SRE will guide incident responses, identify root causes, and provide solutions to mitigate and resolve issues. This role requires experience in high-traffic SaaS environments and expertise in delivering high availability. The SRE will also design and build cloud infrastructure, participate in performance analysis and capacity planning, manage platform scalability, and implement monitoring enhancements.

Good To Have:

Building PCI compliant systems
Working with infrastructure for payment processing systems
Developing high-volume transaction systems

Must Have:

3+ years of experience in operating high-traffic SaaS environments
Deep expertise in delivering high availability
Skills to build a fully automated cloud orchestration framework on AWS
Experience with containerized infrastructure in Production (Kubernetes, EKS, ECS)
Experience implementing configuration management solutions using Infrastructure as Code
Strong working knowledge of Linux
Solid scripting skills (e.g. Bash, Python)
Experience with performance diagnostics and monitoring

Add these skills to join the top 1% applicants for this job

aws

bash

github

kubernetes

python

terraform

ansible

scalability

java

php

github-actions

ci-cd

kotlin

helm

redis

Want to help us help others? We’re hiring!

GoFundMe is the world’s most powerful community for good, dedicated to helping people help each other. By uniting individuals and nonprofits in one place, GoFundMe makes it easy and safe for people to ask for help and support causes—for themselves and each other. Together, our community has raised more than $40 billion since 2010.

Come join us! The GoFundMe team is searching for our next Site Reliability Engineer (SRE). You will be responsible for the full system lifecycle including infrastructure provisioning, system configuration, monitoring, and incident response in production environments. The SRE uses technical analysis to assess the availability, latency, scalability, and efficiency of a product or infrastructure and builds reliability into systems. To ensure the highest level of application performance and availability, the reliability engineer works closely with development teams, relevant functional operations teams, network engineers, database administrators, technology vendors and partners. The successful reliability engineer effectively guides incident responses, helps identify root causes and provides recommendations or solutions to mitigate and resolve issues.

Candidates considered for this role will be located in Buenos Aires, Argentina. There will be an in-office requirement of 2-3x a week.

The Job

Design and build out our cloud infrastructure (we run everything in AWS).
Participate in software and system performance analysis, tuning, and service capacity planning.
Manage the availability, scalability, security, and performance of our platform and applications.
Diagnose bottlenecks for the full stack and provide recommendations to overcome the bottlenecks as an interim work around, while long-term solutions are investigated.
Periodically assess all monitoring requirements and implement enhancements to meet or exceed changing business needs.
Proactively review, recommend, and implement changes to the live infrastructure after ensuring the right validation has been carried out.
Work across engineering to improve SLO/SLI framework
Use data analysis to pick up trends before they become major problems.
Perform 24/7 on-call duties.

You

3+ years of experience in operating high-traffic SaaS environments.
Deep expertise in the mentality, processes, and tools needed to deliver high availability.
Skills to build a fully automated, highly elastic cloud orchestration framework on AWS.
Experience running containerized infrastructure in Production (Kubernetes using EKS, AWS ECS)
Experience implementing configuration management and automation solutions using Infrastructure as Code, CI/CD and GitOps (Ansible, Terraform, ArgoCD, Github Actions)
Strong working knowledge of Linux and its underlying components, system statistics, performance tuning, filesystems and IO.
Solid scripting skills (e.g. Bash, Python).
Experience with performance diagnostics, performance tuning, capacity planning, and monitoring.
BS in Computer Science or equivalent.
Good verbal and written communication skills.

Preferred

Building PCI compliant systems
Working with infrastructure for payment processing systems
Developing high-volume transaction systems
Passion for building fault tolerant and secure platforms

Technologies you are likely to be working with

AWS, Docker, Kubernetes, ECS, Helm, ArgoCD, CloudFlare, Terraform, Ansible, MySQL/Aurora, Nginx, Loft, Devspace, Elasticsearch, Kafka, Redis, Github, Bash, Python, PHP, Java, Kotlin, Sumologic, NewRelic, PagerDuty

Why you’ll love it here

Make an Impact: Be part of a mission-driven organization making a positive difference in millions of lives every year.
Innovative Environment: Work with a diverse, passionate, and talented team in a fast-paced, forward-thinking atmosphere.
Collaborative Team: Join a fun and collaborative team that works hard and celebrates success together.
Competitive Benefits: Enjoy competitive pay and comprehensive healthcare benefits.
Holistic Support: Enjoy financial assistance for things like hybrid work, family planning, along with generous parental leave, flexible time-off policies, and mental health and wellness resources to support your overall well-being.
Growth Opportunities: Participate in learning, development, and recognition programs to help you thrive and grow.
Commitment to DEI: Contribute to diversity, equity, and inclusion through ongoing initiatives and employee resource groups.
Community Engagement: Make a difference through our volunteering and Gives Back programs.

We live by our core values: impatient to be great, find a way, earn trust every day, fueled by purpose. Be a part of something bigger with us!

GoFundMe is proud to be an equal opportunity employer that actively pursues candidates of diverse backgrounds and experiences. We do not discriminate on the basis of race, color, religion, ethnicity, nationality or national origin, sex, sexual orientation, gender, gender identity or expression, pregnancy status, marital status, age, medical condition, mental or physical disability, or military or veteran status.

Individual pay is determined by work location and additional factors including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range based on your location during the hiring process.

If you require a reasonable accommodation to complete a job application or a job interview or to otherwise participate in the hiring process, please contact us at accommodationrequests@gofundme.com.

Global Data Privacy Notice for Job Candidates and Applicants:

Depending on your location, the General Data Protection Regulation (GDPR) or certain US privacy laws may regulate the way we manage the data of job applicants. Our full notice outlining how data will be processed as part of the application procedure for applicable locations is available here. By submitting your application, you are agreeing to our use and processing of your data as required.

Learn more about GoFundMe:

We’re proud to partner with GoFundMe.org, an independent public charity, to extend the reach and impact of our generous community, while helping drive critical social change. You can learn more about GoFundMe.org’s activities and impact in their FY ‘24 annual report.

Our annual “Year in Help” report reflects our community’s impact in advancing our mission of helping people help each other.

For recent company news and announcements, visit our Newsroom.

#LI-KM1

#LI-HYBRID

Set alerts for new jobs by Go Fund Me

Set alerts for new jobs in Argentina

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

A global community of game builders. Helping people upskill and land jobs in the best gaming studios.

Company

Key Links

hello@outscal.com

Made in INDIA 💛💙

Site Reliability Engineer II

Job Summary

Job Description

15 skills required for this role

Job Details

Job Alerts

Go Fund Me

Senior Manager, Machine Learning Engineering

Regulatory Affairs Analyst II

Senior Regulatory Governance Analyst

Manager, Technical Support Engineering

Regulatory Affairs Analyst

Manager, Software Engineering (Payments)

Manager, Customer Care

Contracts Manager

Principal Product Manager, Marketplace Experience

Director, Corporate Communications

Level Up Your Career in Game Development!