Site Reliability Engineer - Cloud Platform

4 Months ago • Upto 10 Years • DevOps

Job Summary

Job Description

Guidewire is seeking a Site Reliability Engineer to ensure the reliability of their cloud platform and InsuranceSuite products. Must-haves include strong automation skills with Bash, Python, or Go, familiarity with Agile development, and deep Linux systems knowledge. Experience with AWS, Kubernetes, and containerization is crucial.
Must have:
  • AWS Experience
  • Linux Systems
  • Automation Skills
  • Containerization
Good to have:
  • Terraform/Terragrunt
  • DevOps/GitOps
  • Microservices
  • Single-Sign On
Perks:
  • Hybrid Work
  • Travel Opportunities

Job Details

About the job

At Guidewire, we make software that offers Property and Casualty (P&C) Insurance companies the tools to take care of their customers when they need it the most, whether that’s a time of crisis, a natural disaster, an accident, or exposure to cyber risks. We build the core applications that insurance companies use to sell and underwrite policies, settle claims, and bill their customers. We also have a portfolio of innovative products serving the needs of P&C insurance companies in areas such as data management, digital online portals, and predictive analytics. We run these products on the Guidewire Cloud Platform, and we help hundreds of insurance providers all over the world to handle billions of dollars of business.

We are proud to be voted a Top Cloud Employer on Glassdoor by our own employees and positioned as a market leader by industry experts like Gartner. We have a fun work environment and a culture that lives by our core values of integrity, rationality, and collegiality.

We’re searching for people who are as passionate about working together to deliver quality products and support as we are. Join us and enjoy a career where you can make an impact. You’ll be inspired by those around you, and you’ll be trusted and empowered to go further.

As a Site Reliability Engineer, you will be part of a team that is passionately automating everything possible to make Guidewire systems run more efficiently. The Platform team is dedicated full-time to creating and running software that improves the reliability of systems in production, serving hundreds of customers and supporting millions of transactions each day. You will be ensuring the reliability of Guidewire’s flagship cloud platform and InsuranceSuite products and building tooling to help ensure efficient operations and optimal availability of all SaaS multi-tenant and customer-focused systems. Platform SREs collaborate closely with Guidewire’s core product developers to ensure that the Guidewire core cloud products address functional and non-functional requirements such as availability, performance, observability, and maintainability.

This role requires a high degree of collaboration, teamwork, ownership and responsibility. If you like to be challenged and have a passion for solving problems at scale with systems like AWS, Kubernetes and Aurora, then we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, "If you have to do something more than once, automate it," and who can rapidly self-educate on new concepts and tools. Bonus points if you have prior experience doing production support of a SaaS platform and are comfortable working with bleeding edge highly containerized cloud-native environments in AWS.

Essential Duties And Responsibilities

  • Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments
  • Oversee and automate the team’s growing presence in AWS
  • Contribute to core infrastructure systems development with features, bug fixes, reliability improvements, etc
  • Platform reliability engineering of a complex single sign-on SAML/OAuth-based central authentication platform
  • Creatively build and develop tooling to aid in driving 24x7x365 follow-the-sun operations of critical production systems
  • Automate deployment tasks for core product and infrastructure tools and maintain automation infrastructure
  • Create system documentation and training materials to empower and educate our fellow team members
  • Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure
  • Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks and issues
  • Enhance platform observability with helping create a self-healing approach to platform reliability
  • Collaborate with engineering teams, providing product feedback and where necessary contribute code to the product



Required Skills And Experience

  • Education and Work Experience
  • Bachelor’s Degree in Computer Science or related field
  • Software engineering and task automation skills with Bash, Python, and/or Go are a must
  • Familiarity with the Agile software development lifecycle
  • Deep background with Linux systems and engineering
  • Highly experienced with engineering and automating on Amazon Web Services (AWS)
  • Experience supporting web applications running on Java / Apache / Tomcat in a live production environment
  • Prior experience with IaC tools like Terraform/Terragrunt/Terraspace
  • Prior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity) for gate promotions
  • Production-At-Scale support background in a heavily microservice-based world
  • Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking)
  • Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta)
  • Seasoned expertise around x.509 certificate technology and basic concepts of encryption
  • Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS
  • Advanced exposure to application development, web UI (design and development), JSON, application architecture
  • Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty
  • Familiarity with event store/stream-processing technologies like Kafka or AWS SQS
  • Understanding of Open Application Model systems such as KubeVela or Crossplane
  • Personal Qualities and Soft Skills
  • You greatly prefer writing code than clicking a GUI
  • You enjoy teaching, being a mentor to others, and working across boundaries
  • Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving
  • Strong analytical mind with a penchant for process development and enhancement
  • A highly positive can-do attitude with desire for being a team player
  • Great communication skills and ability to explain complex technical concepts to a varied audience
  • Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments
  • Other Requirements
  • Ability to read, write, and speak English
  • We provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational support
  • Travel – Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetings

Similar Jobs

Infoblox - Product Security Architect

Infoblox

Washington, United States (On-Site)
2 Months ago
WebFX - Jr. Web Developer

WebFX

Ann Arbor, Michigan, United States (On-Site)
3 Months ago
BigID - Sr Solutions/Presales Engineer - DMV

BigID

Washington, District Of Columbia, United States (Remote)
2 Months ago
Microsoft - Digital Cloud Solution Architect, Dutch Market - Hybrid

Microsoft

Dublin, County Dublin, Ireland (Hybrid)
1 Month ago
CloudHire - ML Engineer

CloudHire

Telangana, India (Remote)
1 Day ago
ByteDance - Backend Engineer (Cloud Platform) Intern

ByteDance

Singapore (On-Site)
2 Weeks ago
Imagineio - MLOps / DevOps Engineer

Imagineio

New Delhi, Delhi, India (Hybrid)
8 Months ago
Riot Games - Systems Engineer II - Infrastructure

Riot Games

Los Angeles, California, United States (On-Site)
2 Weeks ago
Unity - Site Reliability Engineer

Unity

Bellevue, Washington, United States (On-Site)
5 Months ago
ByteDance - Cloud Native Infrastructure Engineer - Foundational Technology

ByteDance

Singapore (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Qualys - Cloud Security Engineer

Qualys

Pune, Maharashtra, India (On-Site)
3 Months ago
The Walt Disney Company - Sr Software Engineer (JavaScript)

The Walt Disney Company

Washington, United States (On-Site)
1 Month ago
Paytm - DevOps Automation - Senior DevOps Engineer

Paytm

Noida, Uttar Pradesh, India (On-Site)
2 Months ago
Passion Gaming - Back End Engineer PHP

Passion Gaming

Gurugram, Haryana, India (On-Site)
5 Months ago
Zazz - Senior Data Engineer - Snowflake

Zazz

(Remote)
3 Days ago
Oportun - Senior Data Scientist (R11845)

Oportun

India (Remote)
3 Months ago
Keywords Studios (Player Support) - Software Development Engineer in Test II

Keywords Studios (Player Support)

Pune, Maharashtra, India (Hybrid)
1 Month ago
ByteDance - Backend Engineer (Timeseries Database) - Cloud Infrastructure

ByteDance

Singapore (On-Site)
3 Months ago
PwC - Senior Associate_ Java backend developer _Application  Technology_Advisory_Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
4 Months ago
The Walt Disney Company - Senior QA Engineer (Software)

The Walt Disney Company

Orlando, Florida, United States (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

Nagarro - Associate Staff Engineer, PHP Magento

Nagarro

India (Remote)
3 Months ago
Rackspace Technology - Customer Success Lead Engineer ( R-20295)

Rackspace Technology

Gurugram, Haryana, India (Remote)
1 Month ago
Infinity Learn - Social Media Manager

Infinity Learn

Hyderabad, Telangana, India (On-Site)
2 Months ago
Dream Sports - Manager - Partnerships

Dream Sports

Mumbai, Maharashtra, India (On-Site)
2 Months ago
PwC - IN_Senior Associate _Java Developer _Data & Analytics _Advisory _PAN India

PwC

Kolkata, West Bengal, India (On-Site)
4 Months ago
PwC - IN_Director_ Econometric Modelling_ Growth Business_  Advisory_ Noida

PwC

Noida, Uttar Pradesh, India (On-Site)
4 Months ago
Zeta - Director - People Partner

Zeta

Hyderabad, Telangana, India (On-Site)
3 Months ago
Aditya Birla Group - Backend Engineer

Aditya Birla Group

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Crunchyroll - Principal Software Engineer

Crunchyroll

Hyderabad, Telangana, India (On-Site)
2 Months ago
PwC - Senior Associate -Kolkata- Technology Consulting

PwC

Kolkata, West Bengal, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Google - Data and Analytics Engineer

Google

Bengaluru, Karnataka, India (On-Site)
1 Month ago
ByteDance - Global SRE Lead, Security Engineering

ByteDance

Singapore (On-Site)
3 Months ago
Luxoft - Solutions Architect

Luxoft

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Crunchyroll - Staff DevOps Engineer, Embedded Cloud Reliability

Crunchyroll

San Francisco, California, United States (Hybrid)
6 Days ago
Axinous - Senior Software Development Engineer

Axinous

Bengaluru, Karnataka, India (On-Site)
1 Month ago
GoTo Group - Principal SRE Engineer (SE5)

GoTo Group

Gurugram, Haryana, India (On-Site)
3 Months ago
CloudHire - Senior Cloud AWS Engineer

CloudHire

Bengaluru, Karnataka, India (Remote)
3 Months ago
Sinch - Lead System Engineer (Linux)

Sinch

Noida, Uttar Pradesh, India (On-Site)
3 Months ago
Atari - Lead - Cloud & Infrastructure Strategy

Atari

Atari, Punjab, India (On-Site)
4 Months ago
Microsoft - Senior Site Reliability Engineer

Microsoft

Hyderabad, Telangana, India (On-Site)
2 Weeks ago

Get notifed when new similar jobs are uploaded