L2 Production Support engineer (Gen AI/LLM Based Applications)
P99 soft
Job Summary
This L2 Production Support Engineer role involves providing operational support for AI-driven and enterprise software applications. Key responsibilities include incident triage, root cause analysis, monitoring application health, supporting CI/CD deployments, and troubleshooting issues on cloud platforms. The role requires strong problem-solving skills and ensuring high availability, reliability, and performance of production systems, including Docker/Kubernetes environments and various integration patterns. Participation in an on-call rotation is also required.
Must Have
- Provide L2 support for production applications
- Perform incident triage, root cause analysis, and issue resolution
- Monitor application health using logs, alerts, and dashboards
- Support CI/CD pipeline deployments and environment stability
- Troubleshoot issues with deployed services on cloud platforms (AWS, Azure, GCP)
- Collaborate with L3 engineering teams for complex problem resolution
- Ensure adherence to security protocols (OAuth, SSO, Entra ID, Okta)
- Maintain and troubleshoot Docker/Kubernetes-based deployments
- Support varied integration patterns (REST, SOAP, gRPC, Web Sockets, Batch, Webhooks)
- Perform performance tuning, load testing analysis, and optimization
- Maintain documentation of issues, resolutions, runbooks, and support procedures
- Participate in an on-call rotation for 24x7 support
- Experience with React JS, Next JS, Java, .NET, Python
- Experience with AWS, Azure, GCP
- Experience with Git, branching/merging, pipelines
- Experience with Docker, Kubernetes
- Experience with APIs (REST/SOAP), gRPC, WebSockets, batch jobs
Good to Have
- Experience with Generative AI/LLM-based applications and platforms (Azure AI Studio, AWS Bedrock, Hugging Face)
- Exposure to RAG pipelines, data ingestion, cleansing, and evaluation
- Knowledge of IaC tools such as Terraform/Ansible for environment setup
- Experience supporting large-scale, data-driven AI/ML applications
Job Description
We are looking for an experienced L2 Production Support Engineer to provide operational support for AI-driven and enterprise software applications. The role requires strong problem-solving skills, a deep understanding of modern application stacks (frontend, backend, cloud, and integrations), and the ability to ensure high availability, reliability, and performance of production systems.
Key Responsibilities
- Provide L2 support for production applications, ensuring minimal downtime and quick resolution of incidents.
- Perform incident triage, root cause analysis, and issue resolution for application, infrastructure, and integration-related problems.
- Monitor application health using logs, alerts, dashboards, and proactively prevent potential failures.
- Support CI/CD pipeline deployments, rollback handling, and environment stability.
- Work with cloud platforms (AWS, Azure, GCP) to troubleshoot issues with deployed services.
- Collaborate with L3 engineering teams for complex problem resolution and permanent fixes.
- Ensure adherence to security protocols (OAuth, SSO, Entra ID, Okta) in production environments.
- Maintain and troubleshoot Docker/Kubernetes-based deployments.
- Support varied integration patterns (REST, SOAP, gRPC, Web Sockets, Batch, Webhooks).
- Perform performance tuning, load testing analysis, and optimization of applications.
- Maintain documentation of issues, resolutions, runbooks, and support procedures.
- Participate in an on-call rotation to provide 24x7 support for critical applications.
Required Experience & Skills
- Bachelor’s/Master’s degree in Computer Science, Engineering, or related field.
- 5–8 years of experience in Production Support or Application Support (L2 role).
- Strong hands-on experience with:
- Frontend & backend stacks: React JS, Next JS, Java, .NET, Python
- Cloud platforms: AWS, Azure, GCP
- CI/CD tools & VCS: Git, branching/merging, pipelines
- Containers & Orchestration: Docker, Kubernetes
- Integration troubleshooting: APIs (REST/SOAP), gRPC, WebSockets, batch jobs
Beneficial / Nice-to-Have
- Experience with Generative AI/LLM-based applications and related platforms (Azure AI Studio, AWS Bedrock, Hugging Face).
- Exposure to RAG pipelines, data ingestion, cleansing, and evaluation.
- Knowledge of IaC tools such as Terraform/Ansible for environment setup.
- Experience supporting large-scale, data-driven AI/ML applications.
E-mail resume to srinivas.adepu@p99soft.com