Site Reliability Engineer

1 Month ago • 2 Years + • DevOps

About the job

Job Description

Seeking a self-motivated Site Reliability Engineer with 2+ years of experience in cloud computing and proven skills in designing and operating high-availability systems. Experience with AWS, Linux/Unix, containerization, and automation tools is a must. Ideal candidate will have a passion for reliability engineering and a strong understanding of monitoring, alerting, and logging systems.
Must have:
  • AWS experience
  • Linux/Unix skills
  • Containerization skills
  • Automation tools
Good to have:
  • Microservice arch
  • Python experience
  • SaaS monitoring
  • CloudFormation
Perks:
  • Flexible work
  • Cutting edge tech
Not hearing back from companies?
Unlock the secrets to a successful job application and accelerate your journey to your next opportunity.

About the job

Title: Site Reliability Engineering Lead

Location: Chennai , India

Department: Trimble Cloud Core Platform

Are you interested in cutting edge cloud technologies, ready to dirt your hands in the cloud world? Do you like to be part of a core team with industry leading site reliability engineering standards?

What You Will Do

Are you a self-motivated and enthusiastic Site Reliability Engineer with hands-on experience in cloud computing? If so, our Trimble cloud core platform division is looking for people like you to join our dynamic SRE team. You will join Trimble cloud core platform team to work on provisioning and operating our core engineering services in the public cloud.

  • Design, implement, and maintain high-availability and scalable systems, ensuring our platforms run smoothly 24/7 with minimal downtime
  • Emphasize SRE as an engineering discipline, driven by automation. Create and improve IaC, automation tools for continuous integration, deployment, and incident response, reducing manual work and improving response times.
  • Develop and maintain comprehensive monitoring, alerting, and logging systems to provide deep insights into system performance, identifying potential issues before they impact users.
  • Monitor system performance and usage, conducting capacity planning and scaling efforts to meet growing demands.Design cost controls and rollout the cost optimization strategy.
  • Own KPIs for site stability, performance, and root cause analysis (RCA) for production issues.Develop services for automatic incident and disaster recovery.
  • Participate in troubleshooting, capacity analysis, planning, and performance analysis.
  • Lead incident response efforts, perform root cause analyses, and implement post-mortem processes to prevent future issues and improve system resilience
  • Handle escalations from internal stakeholders and manage critical issues to resolution.
  • Identify problems and opportunities for improvements that are common across many teams and services.
  • Responsible for fixing compliance issues and requirements raised by CyberSecurity tools
  • Adopt reliability engineering practices such as error budgets, blameless retrospectives, chaos engineering, etc.
  • Production operational support of our global service catalog
  • Foster collaboration with software product development, architecture, and engineering team to ensure releases are delivered with repeatable and auditable processes
  • Ensure 24x7 coverage with business continuity principles.
  • Learn and be passionate about cloud computing
  • Evaluate and utilize the newer technologies coming in the industry to keep the solution on the cutting edge
  • Mentor junior SREs and other engineering team members, sharing knowledge and promoting a culture of reliability, efficiency, and continuous learning.

What Skills & Experience You Should Bring

  • Bachelor's/Master’s degree in Computer Engineering, or related field
  • Minimum 2+ years experience in technical.
  • History of supporting applications and infrastructure in Production
  • Experience in Capacity planning and Cost optimization
  • Experience with Amazon Web Services (Azure or GCP acceptable)
  • Deep understanding of Linux/Unix operating systems
  • Experience building and deploying containers and serverless architecture.
  • Familiarity with modern web application development and architecture
  • Experience using a high-level scripting language (Python preferred) and IaC tools(Terraform , CloudFormation) and containerization

Desired Skills

  • AWS Certification (or equivalent in another public cloud)
  • Experience with microservice architecture
  • Expertise in Python or another high-level programming language
  • Experience with SaaS monitoring tool sets (Datadog, SumoLogic, PagerDuty, InfluxDB , Grafana)
  • Experience in CloudFormation, SAM Template and Terraform
  • Experience in Github, Atlassian tools , Bitbucket , Jira and Confluence
  • Experience in Ansible and Packer
  • Experience using SQL and NoSQL databases
  • Experience with Github actions, Jenkins, Azure DevOps and Gradle for CI/CD

About Our Location

The global pandemic fundamentally changed the way we think about work and the workplace.

We created a Flexible Work Arrangement (FWA) Program to provide a framework for flexibility in where, when, and how we work.

Trimble’s new office in Chennai features state-of-the-art infrastructure and facilities and will enable Trimble to better serve its customers and partners from around the world. The 300,000 square feet Class A office space, with 50 meeting rooms and a seating capacity of nearly 2,000 staff simultaneously, allows for effective social distancing and compliance to local Covid guidelines.

Offering employees greater flexibility, the Chennai office will provide a hybrid working model.

About Our Trimble Cloud Core Division

Trimble Cloud Core Platform is leading Connect & Scale. As part of Trimble's Office of Digital Transformation, we create cloud-first workflows that enable Trimble's customer-centric approach in digital transformation.

Our products and common core services connect data, users, and applications across the enterprise. Our central approach enables Trimble scale, collaboration, enterprise security, and cost-efficiency

Trimble’s Inclusiveness Commitment

We believe in celebrating our differences. That is why our diversity is our strength. To us, that means actively participating in opportunities to be inclusive. Diversity, Equity, and Inclusion have guided our current success while also moving our desire to improve. We actively seek to add members to our community who represent our customers and the places we live and work.

We have programs in place to make sure our people are seen, heard, and welcomed and most importantly that they know they belong, no matter who they are or where they are coming from.

Trimble’s Privacy Policy

Our Company

Trimble is transforming the way the world works by delivering products and services that connect the physical and digital worlds. Core technologies in positioning, modeling, connectivity and data analytics enable customers to improve productivity, quality, safety, and sustainability. From purpose-built products to enterprise lifecycle solutions, Trimble software, hardware, and services are transforming a broad range of industries such as agriculture, construction, geospatial and transportation, and logistics. For more information about Trimble (NASDAQ: TRMB), visit www.trimble.com

View Full Job Description

About The Company

Tamil Nadu, India (On-Site)

Tamil Nadu, India (Remote)

Tamil Nadu, India (Hybrid)

Tamil Nadu, India (On-Site)

View All Jobs

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug