Technical Support L2

Growe

1-3 Years | Remote | Full Time | 2 months ago

Apply Now

Job Summary

The Technical Support L2 role at Growe involves responding to production incidents, diagnosing issues using monitoring tools like Grafana and Prometheus, and analyzing logs with OpenSearch/Kibana. Responsibilities include escalating complex problems, maintaining documentation, and participating in on-call rotations. Candidates need 1-3 years of experience in L2 support or SRE, with proficiency in monitoring, logging, and alerting systems, and familiarity with AWS services.

Must Have

Respond to production incidents within defined SLAs.
Use monitoring instruments (Grafana, VictoriaMetrics/Prometheus) to identify and diagnose issues.
Use logging tools (OpenSearch/Kibana) for comprehensive log analysis.
Monitor and respond to alerts from PagerDuty or Grafana On-call.
Escalate complex issues to appropriate specialized teams with clear context.
Create, maintain, and update runbooks and incident documentation.
Provide clear, timely communication during incidents.
Participate in on-call rotations.
1-3 years in L2 support, Site Reliability Engineering or technical support.
Hands-on experience with Grafana, VictoriaMetrics, Prometheus.
Proficiency with OpenSearch (Kibana web interface).
Experience with PagerDuty, Grafana On-call.
Strong analytical skills for identifying issues.
Clear written and verbal communication skills.
Understanding of cloud concepts and familiarity with AWS services (EC2, EKS, RDS, S3).
Systematic approach to problem-solving and incident response procedures.
Ability to quickly learn and effectively use monitoring, logging, and alerting tools.

Good to Have

Strong desire to learn new technologies and tools, with ability to adapt to changing monitoring and alerting systems.
Ability to interpret metrics, logs, and system behavior to make informed decisions.
Attention to details: ensures accuracy in infrastructure changes, configurations, and deployment processes.
Effective communication, ability to explain technical concepts clearly to both technical and non-technical stakeholders.

Job Description

##### Growe welcomes those who are excited to:

Respond to production incidents within defined SLAs and provide rapid problem identification and initial resolution;
Use monitoring instruments (Grafana, VictoriaMetrics/Prometheus) to identify and diagnose issues in microservices architecture;
Use logging tools (OpenSearch/Kibana) for comprehensive log analysis and root cause investigation;
Monitor and respond to alerts from PagerDuty or Grafana On-call, ensuring proper escalation and communication;
Escalate complex issues to appropriate specialized teams (DevOps, SystemRE, PlatformRE) with clear context and documentation;
Create, maintain, and update runbooks, troubleshooting guides, and incident documentation;
Provide clear, timely communication during incidents to stakeholders, development teams, and management;
Contribute to continuous improvement of incident response processes and tool utilization;
Participate in on-call rotations, ensuring timely response to critical incidents and proper handoff procedures;
Provide operational support and guidance to development teams regarding system reliability and performance.

##### We need your professional experience:

1-3 years in L2 support, Site Reliability Engineering or technical support, or related role with incident response experience;
Hands-on experience with Grafana dashboards, VictoriaMetrics, Prometheus, and metrics exporters for system health monitoring and performance analysis;
Proficiency with OpenSearch (Kibana web interface), log aggregation, search queries, and log analysis for troubleshooting and root cause investigation;
Experience with PagerDuty, Grafana On-call, or similar alerting systems for incident response, escalation procedures, and on-call operations;
Strong analytical skills for identifying issues using provided monitoring tools, dashboards, and alerting systems;
Clear written and verbal communication skills for incident reporting, stakeholder updates, and creating/updating runbooks and troubleshooting guides;
Understanding of cloud concepts and familiarity with AWS services (EC2, EKS, RDS, S3) for context in incident response and escalation;
Systematic approach to problem-solving, ability to follow runbooks, and experience with incident response procedures;
Ability to quickly learn and effectively use monitoring, logging, and alerting tools provided by DevOps/SystemRE/PlatformRE teams.

##### We appreciate if you have those personal features:

Strong desire to learn new technologies and tools, with ability to adapt to changing monitoring and alerting systems;
Ability to interpret metrics, logs, and system behavior to make informed decisions;
Attention to details: ensures accuracy in infrastructure changes, configurations, and deployment processes;
Effective communication, ability to explain technical concepts clearly to both technical and non-technical stakeholders.

##### We are seeking those who align with our core values:

GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals;
DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success;
BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.