Project description
We are a group of professionals who enjoy identifying areas of improvement and engineering better solutions. As a crew, we do our best to create a supportive environment where each of us feel appreciated and have a chance to develop professionally.
We are looking for candidates to take the role of Site Reliability Engineer.
Your team:
In our agile operating model, crews are aligned to larger products and services fulfilling client needs and encompass multiple autonomous pods. You'll be working in the Hosting stream in SWISS within the Integration crew in the Events POD for messaging team.
The Integration crew supports our POD members with a hybrid and flexible working model. There is a well-organized onboarding process that ensure you have a good start and settle into the new role smoothly. In joining us, you will have multiple learning opportunities to grow and participate in initiatives focused on your passions.
Responsibilities
- determine the reliability of our digital products, technology services, and the infrastructure that underpins them
- minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or self-healing
- respond to production incidents to gain first-hand experience of operational hotspots and to identify the root causes of problems
- collect and analyze operational data, define and monitor key metrics to identify and communicate areas for improvement
- apply a broad range of engineering practices with a focus on reliability, from instrumentation, performance analysis, and log analytics to automated testing, deployment, and operations
- ensure the quality, security, reliability, and compliance of our solutions by applying our digital principles and implementing both functional and non-functional requirements
Skills
Must have
- 10-12 years of experience in a similar position focused on either IBM MQ distributed or mainframe MQ infra
- Have excellent technical skills in IBM MQ. Ideally these skills will have been gained in a complex operational environment where security and risk aversion are important.
- Have good Unix (Linux) skills.
- Knowledge of Amelia or other automation tools.
- Possess ITIL knowledge
- Possess good general PC skills
- Have a track record of suggesting and implementation of strategies that increase system reliability and performance
- Possess ability to solve complex issues, good at problem statement analysis and solution design thinking
- Shows proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Be a team player with good organizational and communication skills
- Have good written and verbal communication skills in English
- Can interact well with both Business and IT personnel
- Can take responsibility for allocated tasks and see them through to successful completion
- Interested in learning new technologies and practices, reuse strategic platforms and standards, evaluate options, and make decisions with long-term sustainability in mind
- Strong communicator, from making presentations to technical writing
Nice to have
- Knowledge of the other products like TIBCO EMS or Kafka would be advantageous
- Bachelor or master degree or equivalent focusing on IT or Computer engineering.
- Scripting experience would be advantageous
- Understanding or experience with SRE/DevOps models would be an advantage
Other
Languages
English: C1 Advanced
Seniority
Senior