Site Reliability Engineer

13 Minutes ago • 10-12 Years

Devops

Job Description

This role is for a Site Reliability Engineer within the Hosting stream in SWISS, specifically in the Integration crew's Events POD for the messaging team. The team focuses on identifying improvements and engineering solutions in a supportive, agile environment. Key responsibilities include determining the reliability of digital products, technology services, and infrastructure, minimizing failure risks through operational improvements like predictive monitoring and auto-scaling, responding to production incidents, analyzing operational data to identify improvements, and applying engineering practices for reliability, quality, security, and compliance.

Good To Have:

Knowledge of the other products like TIBCO EMS or Kafka would be advantageous.
Bachelor or master degree or equivalent focusing on IT or Computer engineering.
Scripting experience would be advantageous.
Understanding or experience with SRE/DevOps models would be an advantage.

Must Have:

10-12 years of experience in a similar position focused on either IBM MQ distributed or mainframe MQ infra.
Excellent technical skills in IBM MQ. Ideally these skills will have been gained in a complex operational environment where security and risk aversion are important.
Good Unix (Linux) skills.
Knowledge of Amelia or other automation tools.
Possess ITIL knowledge.
Possess good general PC skills.
Have a track record of suggesting and implementation of strategies that increase system reliability and performance.
Possess ability to solve complex issues, good at problem statement analysis and solution design thinking.
Shows proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
Be a team player with good organizational and and communication skills.
Have good written and verbal communication skills in English.
Can interact well with both Business and IT personnel.
Can take responsibility for allocated tasks and see them through to successful completion.
Interested in learning new technologies and practices, reuse strategic platforms and standards, evaluate options, and make decisions with long-term sustainability in mind.
Strong communicator, from making presentations to technical writing.

Add these skills to join the top 1% applicants for this job

communication

team-player

performance-analysis

talent-acquisition

game-texts

agile-development

automated-testing

user-experience-ux

linux

unix

Project description

We are a group of professionals who enjoy identifying areas of improvement and engineering better solutions. As a crew, we do our best to create a supportive environment where each of us feel appreciated and have a chance to develop professionally.

We are looking for candidates to take the role of Site Reliability Engineer.

Your team:

In our agile operating model, crews are aligned to larger products and services fulfilling client needs and encompass multiple autonomous pods. You'll be working in the Hosting stream in SWISS within the Integration crew in the Events POD for messaging team.

The Integration crew supports our POD members with a hybrid and flexible working model. There is a well-organized onboarding process that ensure you have a good start and settle into the new role smoothly. In joining us, you will have multiple learning opportunities to grow and participate in initiatives focused on your passions.

Responsibilities

determine the reliability of our digital products, technology services, and the infrastructure that underpins them
minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or self-healing
respond to production incidents to gain first-hand experience of operational hotspots and to identify the root causes of problems
collect and analyze operational data, define and monitor key metrics to identify and communicate areas for improvement
apply a broad range of engineering practices with a focus on reliability, from instrumentation, performance analysis, and log analytics to automated testing, deployment, and operations
ensure the quality, security, reliability, and compliance of our solutions by applying our digital principles and implementing both functional and non-functional requirements

Skills

Must have

10-12 years of experience in a similar position focused on either IBM MQ distributed or mainframe MQ infra
Have excellent technical skills in IBM MQ. Ideally these skills will have been gained in a complex operational environment where security and risk aversion are important.
Have good Unix (Linux) skills.
Knowledge of Amelia or other automation tools.
Possess ITIL knowledge
Possess good general PC skills
Have a track record of suggesting and implementation of strategies that increase system reliability and performance
Possess ability to solve complex issues, good at problem statement analysis and solution design thinking
Shows proactive approach to spotting problems, areas for improvement, and performance bottlenecks
Be a team player with good organizational and communication skills
Have good written and verbal communication skills in English
Can interact well with both Business and IT personnel
Can take responsibility for allocated tasks and see them through to successful completion
Interested in learning new technologies and practices, reuse strategic platforms and standards, evaluate options, and make decisions with long-term sustainability in mind
Strong communicator, from making presentations to technical writing