Level 2 Engineer

Thales

5+ Years | Singapore (Hybrid) | Full Time | 1 weeks ago

Apply Now

Job Summary

As a Level 2 Engineer at Thales in Singapore, you will lead and coordinate operational support for mission-critical applications and infrastructure, ensuring SLA adherence and system availability. Your responsibilities include incident and problem management, acting as incident manager for P1/P2 issues, performing root cause analysis, and coordinating resolutions. You will also be involved in change management, patch management, documentation, configuration management, and testing, ensuring operational readiness and compliance. This role requires strong technical skills in various technologies like Kubernetes, Docker, Kafka, and RHEL, along with leadership and communication abilities.

Must Have

Lead and coordinate level 2 support operations for mission-critical applications and infrastructure.
Provide troubleshooting and diagnostics for incidents escalated from level 1.
Ensure adherence to SLA and system availability.
Act as incident manager for P1/P2 issues, coordinating resolution and communications.
Perform root cause analysis and recommend permanent fixes.
Perform operational impact assessment and be part of CAB to review and approve change.
Perform patch management readiness and stakeholder coordination.
Create and update operational documentation, SOPs, Incident response checklist, RCA, PIR, monitoring and alert guidebook.
Perform validation and accuracy of configurations and CMDB asset verification.
Ensure operational readiness testing before production deployment rollout.
Gather logs, system metrics at the time of failure for root cause analysis.
At least 5 years in Level 2 support for mission critical 24x7 production support.
At least 2 years in a team lead or supervisory role.
Proven experience in handling P1/P2 incidents, managing post-incident reviews (PIRs) and root cause analysis.
Knowledge of Operating Systems (RHEL, Windows Server), Networking Fundamentals, Middleware & Infrastructure (Nginx, Kubernetes, Docker, Spring Boot), Message Queues (IBM MQ, Kafka), Database (SQL Server, PostgreSQL).
ITIL/ITSM Process Knowledge, Security Awareness, DR and HA concepts.
Strong Technical Skills, Leadership & Coordination, Communication & Collaboration, Operational Governance.

Good to Have

Preferably certification in Red Hat Enterprise Linux or Kubernetes.
Preferably in public sector experience.

Job Description

Level 2 Engineer

----------------

Apply

remote type

Hybrid

locations

Singapore

time type

Full time

posted on

Posted Today

job requisition id

R0307166

Thales is a global technology leader trusted by governments, institutions, and enterprises to tackle their most demanding challenges. From quantum applications and artificial intelligence to cybersecurity and 6G innovation, our solutions empower critical decisions rooted in human intelligence. Operating at the forefront of aerospace and space, cybersecurity and digital identity, we’re driven by a mission to build a future we can all trust.

In Singapore, Thales has been a trusted partner since 1973, originally focused on aerospace activities in the Asia-Pacific region. With 2,000 employees across three local sites, we deliver cutting-edge solutions across aerospace (including air traffic management), defence and security, and digital identity and cybersecurity sectors. Together, we’re shaping the future by enabling customers to make pivotal decisions that safeguard communities and power progress.

KEY ACTIVITIES AND RESPONSIBILITIES

As a Level 2 Engineer, you are accountable for:

Operational Support

Lead and coordinate level 2 support operations for mission-critical applications and infrastructure

Provide troubleshooting and diagnostics for incidents escalated from level 1

Ensure adherence to SLA, system availability

Incident & Problem Management

Act as incident manager for P1/P2 issues

Coordinate resolution and communications

Perform root cause analysis and recommend permanent fixes

Escalate unresolved issues that required software coding to Level 3 or engineering teams

Change Management

Perform operational impact assessment

Part of the CAB to review and approve change

Pre-Change Preparation such as review Change Request and Release Plan

Supervise post-change production verification

Documentation update and knowledge transfer

Post change review and feedback

Patch Management

Perform patch management readiness

Stakeholder coordination and team coordination

System Readiness and Post-Patch Validation

Documentation update and knowledge transfer

Compliance and audit readiness

Documentation and Compliance

Operational documentation. SOPs, Incident response checklist, RCA, PIR, monitoring and alert guidebook

Configuration & Infrastructure Documentation. System configuration baseline, application dependency maps, environment inventories such as hosts, services, accounts

Knowledge Base Articles for level 2 enablement and faster resolution e.g. Known Errors and Fixes, Frequent How-To Guides, Script Repositories, Lessons Learned

Knowledge Management

Configuration Management

Perform validation and accuracy of configurations

Maintain readiness of operational documentation

Perform audit to confirm compliance of configurations

CMDB asset verification

Change-linked configuration tracking

Ensure environment consistency between DEV – IVVQ – ISO-PROD – UAT and PROD

Testing and Verification

Ensure operational readiness testing before production deployment rollout

Ensure post-change verification coordination

Perform regression and sanity test following patching or upgrades, in UAT and PROD

Participation in user acceptance testing

Knowledge Management

Documentation of resolution

Knowledge Base Contribution

Validation of knowledge

Subject Matter Expertise Sharing

Root Cause Analysis

Gather logs, system metrics at the time of failure

Reproduction of issues in a controlled environment to understand the conditions under which it occurs

Determine the scope and severity in terms of the systems affected, downtime duration and business impact

Narrow down the possible sources of causing the failure

Use of diagnostic tools such to analyse the application behaviour

Correlation of events to sequence the chain of events leading up to the failure and identify the dependencies

KAST (Kubernetes Analytics Stack)

THALES proprietary Kubernetes-based platform that provides a foundational digital infrastructure across Thales business domain

Kubernetes

Kubernetes is an open-source platform developed by Google for automating the deployment, scaling, and management of containerized applications (typically Docker containers).

Docker

Docker Compose is a tool for defining and running multi-container Docker applications using a single configuration file (docker-compose.yml). It allows you to define, manage, and run multiple interconnected Docker containers as a single service stack.

Kafka

Apache Kafka is a high-performance distributed streaming platform used for building real-time data pipelines, stream processing, and event-driven architectures.

EMQX

EMQX is an MQTT broker that acts as a message middleware between publishers (e.g., sensors, devices) and subscribers (e.g., apps, dashboards, databases) using the MQTT protocol, which is a lightweight publish-subscribe messaging protocol ideal for low-bandwidth, high-latency, or constrained devices.

Elasticsearch

Elasticsearch is a distributed, open-source search and analytics engine built on top of Apache Lucene. It is widely used for full-text search, log and event data analysis, and real-time data exploration.

MinIO

MinIO is a high-performance, distributed object storage system that stores data as objects (like files, images, videos, backups) in buckets

Zookeeper

Apache ZooKeeper is an open-source coordination service for distributed applications. It provides a highly reliable, consistent, and available mechanism to store metadata, configuration, and state information. It complements Apache Kafka by acting as a metadata management and coordination layer in Kafka’s traditional architecture. ZooKeeper ensures reliability, consistency, and fault-tolerance in Kafka’s distributed setup.

Sparks

Apache Spark is an open-source, distributed computing system designed for fast, large-scale data processing. It was built for performance, especially for iterative algorithms in data science and machine learning.

RHEL

RHEL is a certified Linux operating system optimized for reliability, scalability, and security in business and production environments.

Ansible

Ansible is an open-source IT automation tool developed by Red Hat that simplifies the management of servers, applications, and infrastructure. It allows DevOps and system administrators to automate tasks such as configuration management, software deployment, and orchestration. It uses simple, human-readable YAML files (called playbooks) and SSH

Prometheus

Open-source monitoring and alerting toolkit that is used to collect, store and query metrics, for the monitoring of infrastructure, services, containers and microservices

Grafana

Open-source analytics and visualization platform used for monitoring, observability, and alerting. Commonly used with Prometheus

KEY KNOWLEDGE AND EXPERIENCE

To be successful in your role, you will have demonstrated and/or acquired the following knowledge and experience:

Education and Experience

Bachelor Degree in Information Technology, Computer Science, Engineering, or a closely related discipline

At least 5 years in Level 2 support for mission critical 24x7 production support, preferably in public sector

At least 2 years in a team lead or supervisory role, coordinating tasks and mentoring junior engineers

Proven experience in handling P1/P2 incidents, managing post-incident reviews (PIRs) and root cause analysis

Preferably certification in Red Hat Enterprise Linux or Kubernetes

Knowledge / Skills

Operating Systems. RHEL (90%) and Windows Server (10%)

Networking Fundamentals

Middleware & Infrastructure (Web Server – Nginx, App Servers – Kubernetes with containers (Docker + Spring Boot))

Message Queues (IBM MQ, Kafka)

Database (SQL Server, PostgreSQL)

ITIL/ITSM Process Knowledge

Security Awareness

DR and HA concepts

Strong Technical Skills

Leadership & Coordination

Communication & Collaboration

Operational Governance

At Thales, we’re committed to fostering a workplace where respect, trust, collaboration, and passion drive everything we do. Here, you’ll feel empowered to bring your best self, thrive in a supportive culture, and love the work you do. Join us, and be part of a team reimagining technology to create solutions that truly make a difference – for a safer, greener, and more inclusive world.

25 Skills Required For This Role

Problem Solving Data Analytics Game Texts Postgresql Networking Yaml Apache Zookeeper Nginx Linux Incident Response Spring Boot Windows Server Apache Kafka Prometheus Ansible Grafana Elasticsearch Spark Data Science Docker Microservices Kubernetes Sql Algorithms Machine Learning