Contract: Principal Site Reliability Engineer

1 Month ago • 10 Years + • Devops

Job Summary

Job Description

Upwork is the world's work marketplace, connecting startups and Fortune 100 companies with skilled professionals. This role, within the Hybrid Workforce Solutions (HWS) Team, is a full-time position (~40 hours per week) focused on supporting Upwork's business. The engineer will serve as a technical leader in SRE practices, focusing on zero-trust infrastructure, observability, and cloud-native scalability. Responsibilities include guiding architectural evolution of reliability systems, championing SLO-driven engineering, partnering with platform and security teams, developing AI-assisted tools, defining observability strategies, driving infrastructure automation, leading post-incident reviews, and mentoring engineers. The role involves participation in a production on-call rotation.
Must have:
  • 10+ years in SRE/DevOps/production engineering
  • Kubernetes operations expertise (multi-cluster, service mesh)
  • GitOps pipeline experience (ArgoCD/Flux)
  • Observability tooling fluency (Prometheus, Grafana)
  • Familiarity with reliability-as-code and automation (Python, Go)
  • Zero trust authentication and mTLS policy experience
  • Incident review and standardization experience
  • Cross-functional collaboration skills
Good to have:
  • Zero-trust infrastructure
  • Platform observability
  • Cloud-native scalability
  • Multi-cluster Kubernetes environments
  • Service mesh integration
  • SLO-driven engineering
  • AI-assisted tools and workflows
  • Policy-as-code
  • Workload identity
  • Platform governance

Job Details

Upwork ($UPWK) is the world’s work marketplace. We serve everyone from one-person startups to over 30% of the Fortune 100 with a powerful, trust-driven platform that enables companies and talent to work together in new ways that unlock their potential.  Last year, more than $3.8 billion of work was done through Upwork by skilled professionals who are gaining more control by finding work they are passionate about and innovating their careers.

This is an engagement through Upwork’s Hybrid Workforce Solutions (HWS) Team. Our Hybrid Workforce Solutions Team is a global group of professionals that support Upwork’s business. Our HWS team members are located all over the world. This is an opportunity to work with a major revenue-producing website with millions of users. In addition to making sure everything works you are also expected to contribute to the continuous improvement of our environment. This is a full time position (~40 hours per week, Monday-Friday). This role will participate in our production on-call rotation in your day-time and on some weekends (once every 2-3 weeks).


Work/Project Scope:

  • Serve as a technical leader in modern SRE practices with a focus on zero-trust infrastructure, platform observability, and cloud-native scalability.
  • Guide the architectural evolution of reliability systems, including multi-cluster Kubernetes environments, GitOps workflows, and service mesh integration.
  • Champion SLO-driven engineering across teams and establish frameworks for defining, tracking, and enforcing reliability standards.
  • Partner with platform and security teams to enable service-to-service authentication, policy enforcement, and resilient control planes.
  • Develop AI-assisted tools and workflows (e.g., for incident triage, RCA generation, auto-remediation) to reduce operational burden and accelerate resolution.
  • Define and maintain end-to-end observability strategies including distributed tracing, metrics pipelines, and log enrichment.
  • Drive infrastructure automation efforts using IaC best practices, with an emphasis on policy-as-code, workload identity, and platform governance.
  • Lead post-incident reviews and reliability audits to surface systemic gaps and drive continuous improvement.
  • Mentor engineers across infrastructure and application teams on designing and operating reliable, scalable systems.

Must Haves (Required Skills):

  • 10+ years in SRE, DevOps, or production engineering roles, including experience operating large-scale distributed systems in production
  • Deep expertise in Kubernetes operations, including multi-cluster orchestration, service mesh (Istio or equivalent), and workload policy management (e.g., OPA, Kyverno)
  • Proven experience building and maintaining GitOps pipelines using tools like ArgoCD or Flux
  • Strong fluency in observability tooling (e.g., Prometheus, OpenTelemetry, Grafana, or Datadog), with a focus on SLO-based alerting and incident detection
  • Familiarity with reliability-as-code practices and automation using scripting languages (Python, Go, or Bash) and AI-enhanced workflows (e.g., Cursor, incident bots, PR-generating agents)
  • Experience designing and enforcing zero trust service-to-service authentication, workload identity, and mTLS policies
  • Track record of leading incident review programs, standardizing postmortems, and driving systemic reliability improvements
  • Ability to work cross-functionally with platform, security, and developer enablement teams to embed resilience across the SDLC.

Upwork is proudly committed to fostering a diverse and inclusive workforce. We never discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical condition), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.   

To learn more about how Upwork processes and protects your personal information as part of the application process, please review our Global Job Applicant Privacy Notice

Similar Jobs

gitlab - Senior Stock Administrator

gitlab

United States (Remote)
3 Weeks ago
gitlab - Senior Solutions Architect

gitlab

Mumbai, Maharashtra, India (Remote)
1 Month ago
GHX - Vice President, Enterprise Architecture

GHX

United States (On-Site)
1 Month ago
FICO - Lead Backend Engineer

FICO

Bengaluru, Karnataka, India (On-Site)
2 Months ago
techholding - Senior QA Manager / QA Manager - Contractor

techholding

Mexico (Remote)
1 Month ago
bytedance - Software Engineer Graduate (XR Web Platform-PICO)

bytedance

San Jose, California, United States (On-Site)
6 Months ago
Palo Alto Networks - Principal DevOps Engineer

Palo Alto Networks

Santa Clara, California, United States (On-Site)
1 Month ago
Palo Alto Networks - Senior Principal FinOps/DevOps Engineer

Palo Alto Networks

Santa Clara, California, United States (On-Site)
1 Month ago
Jumio - DevOps Engineer III

Jumio

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Nine - Senior Platform Engineer

Nine

North Sydney, New South Wales, Australia (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Aledade - Senior Security (DevSecOps) Engineer II

Aledade

United States (Remote)
9 Months ago
Morning Star - Senior QA Automation Engineer

Morning Star

Delhi, India (Hybrid)
2 Months ago
QuinStreet - QA Engineer - Automation - Contract - 12 months

QuinStreet

Pune, Maharashtra, India (On-Site)
1 Month ago
Globalization Partners - Sr Software Engineer

Globalization Partners

United Kingdom (Remote)
2 Months ago
MiQ - ERP Data Engineer II

MiQ

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Ion - Senior Security Architect

Ion

Pisa, Tuscany, Italy (On-Site)
10 Months ago
Rocket Science - Full Stack Engineer

Rocket Science

Wales, United Kingdom (Hybrid)
4 Months ago
DataVisor - Security Engineer

DataVisor

Austin, Texas, United States (Remote)
1 Month ago
London stock Exchange - Lead Engineer - Quality Engineering

London stock Exchange

Bengaluru, Karnataka, India (On-Site)
3 Months ago
MURKA - PHP Developer

MURKA

Poland (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Brazil

Epic Games - Producer

Epic Games

Porto Alegre, State Of Rio Grande Do Sul, Brazil (On-Site)
3 Months ago
Haleon - Engineering and EHS Manager

Haleon

Rio De Janeiro, Brazil (On-Site)
3 Weeks ago
Amber - Game Designer - Mobile (Project Based)

Amber

Brazil (On-Site)
1 Year ago
GameJobs - Senior Data Scientist

GameJobs

São Paulo, State Of São Paulo, Brazil (On-Site)
1 Year ago
Palo Alto Networks - Managing Director, Cybersecurity Services

Palo Alto Networks

São Paulo, Brazil (Remote)
1 Month ago
OKX - Senior Audit Manager, FinCrime (LACC)

OKX

São Paulo, Brazil (On-Site)
2 Months ago
WebTech Corporation - Maintenance Planning Assistant

WebTech Corporation

Monte Alto, State Of São Paulo, Brazil (On-Site)
2 Months ago
Google - Account Manager, Black Community Inclusion

Google

São Paulo, State Of São Paulo, Brazil (On-Site)
3 Months ago
nubank - Senior FP&A Analyst

nubank

State Of São Paulo, Brazil (On-Site)
2 Months ago
Google - Software Engineer, Black Community Inclusion

Google

São Paulo, State Of São Paulo, Brazil (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Devops Jobs

Google - Software Engineer III, Front End, Google Cloud AI

Google

Sunnyvale, California, United States (On-Site)
3 Months ago
Motorola solutions - Site Reliability Engineer

Motorola solutions

Gatineau, Quebec, Canada (On-Site)
1 Month ago
Thousand Eyes - Senior Site Reliability Engineer II, Efficiency and Performance

Thousand Eyes

Bengaluru, Karnataka, India (On-Site)
3 Months ago
King - Staff Platform Solutions Engineer

King

New York, United States (On-Site)
2 Months ago
Google - Software Engineer III, Site Reliability Engineering, Google Cloud

Google

San Francisco, California, United States (On-Site)
3 Months ago
zeta - Engineering Manager - Cloud Security (DevSecOps)

zeta

Bengaluru, Karnataka, India (On-Site)
9 Months ago
Interactive Brokers - Senior Platform Engineer

Interactive Brokers

Fort Lauderdale, Florida, United States (Hybrid)
1 Month ago
luxsoft - Senior/Lead DevOps Engineer

luxsoft

Chennai, Tamil Nadu, India (On-Site)
1 Month ago
TensorWave - Technical Solutions Engineer

TensorWave

Las Vegas, Nevada, United States (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded