Senior Software Engineer - Site Reliability

2 Hours ago • 7 Years +

Job Summary

Job Description

The Software Site Reliability Engineer at Vertex ensures enterprise-wide systems are reliable, scalable, and performant by relentlessly measuring and improving environments. They lead and guide teams to implement new software and system capabilities, enhance code, and optimize processes and tools. Leveraging deep infrastructure and software engineering expertise, they build reliable solutions and refactor legacy systems for improved reliability. This role involves driving reliability initiatives, designing and optimizing systems, and fostering a culture of reliability through mentorship. They are also involved in incident management and technical leadership, ensuring solutions align with best practices. The candidate will also participate in CI/CD processes and Agile practices, and mentor organizational software engineering staff.
Must have:
  • Design and deliver SaaS solutions in AWS, Azure, OCI, or GCP
  • Experience with Java, Spring Boot, .NET Core, MVC, JavaScript
  • Experience with Open Telemetry, Datadog, and CloudWatch
  • Experience with Kubernetes, ArgoCD, Helm and TF
  • CI powered performance and synthetics augmenting shift-left testing strategy methods

Job Details

Job Description:

The Software Site Reliability Engineer at Vertex ensures enterprise-wide systems are reliable, scalable, and performant by relentlessly measuring and improving environments. They lead and guide teams to implement new software and system capabilities, enhance code, and optimize processes and tools. Leveraging deep infrastructure and software engineering expertise, they build reliable solutions from inception or refactor legacy systems for improved reliability. Success is driven by data, customer satisfaction, and empowering teams to achieve excellence. 

ESSENTIAL JOB FUNCTIONS AND RESPONSIBILITIES 

  • Drive Reliability: Drive initiatives that enhance system reliability and operational efficiency, guiding teams in implementing code and system design reliability improvements and efficiencies. 

  • Design Optimization: Guide teams in designing, developing, implementing optimized and efficient systems and environments ensuring performance, reliability, and scalability. 

  • Observation and Alerting: Influence teams in designing and implementing applications and systems that put reliability, monitoring, alerting, and analytics first. 

  • Performance Metrics: Guide teams in measuring the health and performance of environments using observability tools, ensuring accurate and actionable metrics. 

  • Culture of Reliability: Foster a culture of reliability and operational excellence through mentorship and training, ensuring consistent implementation of SRE principles. 

  • Incident Management: Guide teams in triaging, isolating, and resolving environmental issues expediently and openly according with incident response protocols and procedures. 

  • Proactive Resolution: Guide teams to anticipate and correct production issues, including outages, processing slowdowns, errors, and failures, using incident management best practices. Ensure teams minimize downtime and ensure rapid recovery. 

  • Technical Leadership: Provide technical leadership for projects, ensuring solutions align with reliability best practices and organizational goals. 

  • Standards & Practices: Develop and publish standards and best practices, guiding teams to implement observability and monitor system performance effectively. 

  • Reliability Feedback: Capture and document engineering and operations case studies to refine published SRE software policies and best practices. 

  • CI/CD Reliability: Guide teams in building and delivering reliability starting from Continuous Integration (CI) and Continuous Deployment (CD) processes, ensuring robust and reliable software delivery pipelines. 

  • Agile Practices: Participate in the plan, prioritization, and breakdown of team deliverables to ensure that they deliver on reliability and quality organizational outcomes. 

  • Mentorship: Guide and mentor organizational software engineering staff, developing their technical skills and knowledge of Site Reliability patterns and practices. 

KNOWLEDGE, SKILLS AND ABILITIES 

Candidate must possess Advanced proficiency of the following: 

Technical 

  • Design and delivery of highly reliable SaaS solutions hosted in AWS, Azure, OCI, or GCP 

  • Software Development frameworks using Java, Spring Boot, .NET Core, MVC, JavaScript  

  • Designing and delivering highly observed, reliable and recoverable enterprise event-driven systems 

  • Deep observability and monitoring experience with Open Telemetry, Datadog, and CloudWatch 

  • Infrastructure, application and synthetic monitoring and alerting techniques and patterns 

  • Institutionalization of application and system metrics with KPIs, SLIs and SLOs 

  • Observable and reliable relational storage solutions with Postgres, MSQL, or similar 

  • Observable and reliable non-relational database technologies and cloud storage like AWS S3 

  • Observable and reliable containerization apps in Kubernetes, ArgoCD, Helm and TF 

  • CI powered performance and synthetics augmenting shift-left testing strategy methods  

  • CD experience using GitHub Actions, Terraform, Go, PowerShell and/or Python  

  • Exposure to AI automation paired programing with GitHub Copilot or similar tools 

  • Scaling application optimization for Network, Memory and IO performance concerns 

Interpersonal 

  • Results-oriented and customer-focused, acting with urgency and purpose. 

  • Ability to make data-driven decisions guided by commitment to customer outcomes. 

  • Strong time management and cross-team partnership ensuring alignment in commitments. 

  • Adaptive verbal and listening skills, being clear and concise while practicing empathy to foster trust and provide meaningful feedback. 

  • Strong written and presentation skills, representing various viewpoints. 

  • Passionate hunger for learning and applying emerging technologies. 

  • Proven ability to root cause system issues and create/own remediation plans. 

EDUCATION and TRAINING 

  • An undergraduate degree, preferably in Computer Science or a similar technical degree. 

  • 7+ years of experience in technology related roles. 

  • 4+ years of experience in a DevOps culture or production SaaS environment. 

Other Qualifications 

The Winning Way behaviors that all Vertex employees need in order to meet the expectations of each other, our customers, and our partners. 

  • Communicate with Clarity - Be clear, concise and actionable. Be relentlessly constructive. Seek and provide meaningful feedback. 

  • Act with Urgency - Adopt an agile mentality - frequent iterations, improved speed, resilience. 80/20 rule – better is the enemy of done. Don’t spend hours when minutes are enough. 

  • Work with Purpose - Exhibit a “We Can” mindset. Results outweigh effort. Everyone understands how their role contributes. Set aside personal objectives for team results. 

  • Drive to Decision - Cut the swirl with defined deadlines and decision points. Be clear on individual accountability and decision authority. Guided by a commitment to and accountability for customer outcomes. 

  • Own the Outcome - Defined milestones, commitments and intended results. Assess your work in context, if you’re unsure, ask. Demonstrate unwavering support for decisions. 

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Worldwide

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Vertex is an Affirmative Action and Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability. If you'd like to view a copy of the company's affirmative action plan, please email AskHR@vertexinc.com. If you are an individual with a disability and would like to request a reasonable accommodation as part of the employment selection process, please contact 610-640-4200 or AskHR@vertexinc.com.
View All Jobs

Get notified when new jobs are added by Vertx Inc.

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug