Technical Architect - Monitoring

1 Month ago • 10-12 Years • Technical Art

Job Summary

Job Description

As a Monitoring SME & Architect, you will design and implement comprehensive monitoring solutions to ensure system uptime, health, performance, and reliability. You will be responsible for reducing alert volume, implementing intelligible alerting, alert co-relations, and setting up an early warning system. Collaboration across teams and centralized dashboarding are key to removing silos. Architecting monitoring configurations with scalability, security, and automation, including future AI integration, is also a responsibility. Key accountabilities include ensuring monitoring effectiveness, setting up centralized monitoring configuration, driving down alert volume, and implementing advanced monitoring alerts for golden signals like latency and errors. The role requires hands-on experience with monitoring tools, logs, metrics, and various technologies. It also includes creating dashboards, analysis and integration with ITSM tools. Minimum requirements include an associate’s degree or equivalent, 10-12 years of IT experience, and 6 years of monitoring experience.
Must have:
  • Monitoring Tool Administration experience.
  • Experience with Logs, Metrics, Traces, Parsing, and RegEx.
  • Experience implementing APM, EUM, and API endpoints.
  • Experience with Ansible, Python, and Selenium.
  • Experience with Azure, VMWare, and AWS on an Enterprise scale.
  • Creating and analyzing dashboards.

Job Details

Scope:
•    As a Monitoring SME & Architect, you will be responsible for designing, implementing a comprehensible Monitoring Solutions & process to ensure uptime, system health, performance & reliability. You will be responsible for reduction of alert volume, implement intelligible alerting, alert co-relations, compression of alerts, measuring signal to noise ratio and setting up an early warning system across Operations. You will be required to collaborate across teams and create centralized dashboarding and visibility to remove Silos. You will be responsible for architecting monitoring configurations in a scalable & secure model leveraging automation with a future scope of AI integrated Monitoring Operations.


Our current technical environment:

•    Technical Skills: Monitoring Tool Administration, Logs Indexing & pipeline, Azure, VMWare, Ansible, Python, Selenium, Terraform, Shell, Windows, Linux, GROK parsing
•    Problem-solving skills – should be able to devise technical and creative solutions. Use Analytics to understand pattern and pro-actively identify gaps
•    Communication skills – Effective communication is key in this role to gather data about problems, prepare detailed notes and reports, and update users with further steps
•    Time management – Need to maintain excellent time management skills and should be able to set priorities when handling multiple cases.
•    Team collaboration – To routinely work with other functions to resolve user issues, so they need to successfully collaborate with team members and coworkers.
•    Highly motivated, hands-on personality.
•    Ability to learn quickly in a challenging environment

Key Accountability 

  • Monitoring Effectiveness – Ensuring the monitoring framework and enhancements are setup to increase Pro-active identification & resolution prior to customer impact.
  • Setup & maintain centralized Monitoring Configuration by code
  • Consistently drive the alert volume down and eliminate false alerts
  • Setup advanced monitoring alerts for golden signals i.e. Latency, Errors, Throughputs & Saturation.
  • Transform from traditional CPU, Memory symptomatic monitors to more advanced alert co-relation pinpointing directly to issues for predictive monitoring
  • Create & implement Synthetic or End User Monitoring using Python, Selenium for customer experience monitoring
  • Set up API End point monitoring & measure uptime & availability across customers, products & infrastructure endpoints.
  • Implement SLOs, SLIs, Error Budgets concepts to measure & setup Maturity model
  • Maintain & Manage Code Repository built to scale and security measures
  • Leverage Automation to push changes on monitoring tools
  • Setup Orchestration mechanism for on-boarding & decommissioning to ensure Operational Readiness
  • Setup Dashboards & Create visibility across all Cross-functional teams
  • Establish Telemetry for automated collection of data across Metrics, Logs & Traces
  • Continuous Analysis on Data to acknowledge gaps and implementing improvements


Minimum Requirements 

  • Associate’s degree (or equivalent) in Computer Science; Information Technology or related field preferred
  • 10-12 years of IT experience with 6 years of Monitoring Experience
  • Experience in Administrating  Monitoring Tools – AppDynamics, SolarWinds, Grafana, Zabbix, DataDog, ELK Stack etc.
  • Hands-on experience on Logs, Metrics, Traces, Parsing, RegEx, Tagging
  • Hands-on experience on implementing APM, EUM, Synthetics, API endpoint etc.
  • Hands-on experience on integrations with ITSM tools such as Service Now & Jira
  • Hands-on experience on Ansible, Python, Selenium, Shell
  • Hands-on experience on Enterprise scale of Azure, VM Ware & AWS
  • Hands-on experience on creating dashboards and analysis
  • Excellent interpersonal, influencing skills, interacting appropriately with colleagues of many technical skill levels, remaining calm and courteous while working in a high-stress situation to resolve problems.

Skills:

  • Technical Skills: Monitoring Tool Administration, Logs Indexing & pipeline, Azure, VMWare, Ansible, Python, Selenium, Terraform, Shell, Windows, Linux, GROK parsing
  • Problem-solving skills – should be able to devise technical and creative solutions. Use Analytics to understand pattern and pro-actively identify gaps
  • Communication skills – Effective communication is key in this role to gather data about problems, prepare detailed notes and reports, and update users with further steps
  • Time management – Need to maintain excellent time management skills and should be able to set priorities when handling multiple cases.
  • Team collaboration – To routinely work with other functions to resolve user issues, so they need to successfully collaborate with team members and coworkers.
  • Highly motivated, hands-on personality.
  • Ability to learn quickly in a challenging environment.

Our Values


If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success – and the success of our customers. Does your heart beat like ours? Find out here: Core Values

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.

Similar Jobs

Tesla - Automotive Mechatronics Technician/Auto Mechanic - Used Car Refurbishment

Tesla

Hanau, Hessen, Germany (On-Site)
4 Months ago
Coupa - Account Executive (Mid-Market)

Coupa

Seattle, Washington, United States (Remote)
1 Month ago
Jagex - Indirect Category Manager

Jagex

Cambridge, England, United Kingdom (Hybrid)
4 Weeks ago
bytedance - Data Analyst - AI Innovation Business

bytedance

Singapore (On-Site)
8 Months ago
Mindtickle - Lead Product Designer- Analytics

Mindtickle

Pune, Maharashtra, India (Hybrid)
3 Months ago
Salesforce - Technical Architect - MuleSoft

Salesforce

Tokyo, Japan (Remote)
6 Months ago
frames store - FREELANCE: VFX PRODUCERS - NEW YORK

frames store

New York, New York, United States (On-Site)
1 Year ago
Visual Concepts - Senior Technical Artist

Visual Concepts

Austin, Texas, United States (Remote)
1 Week ago
magnopus - Technical Artist II

magnopus

Los Angeles, California, United States (Hybrid)
4 Weeks ago
Larian Studios - Technical Artist

Larian Studios

(On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

OKX - Senior Agent, Customer Service (Korean Speaker)

OKX

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (Hybrid)
8 Months ago
Optiv - Sr. Unix/Linux Engineer

Optiv

Columbia, Maryland, United States (On-Site)
3 Weeks ago
lifechruh - Senior Quality Engineer

lifechruh

Edmond, Oklahoma, United States (On-Site)
8 Months ago
Interactive Brokers - Automated Quality Assurance Engineer

Interactive Brokers

Greenwich, Connecticut, United States (Hybrid)
1 Month ago
Tencent - Cross-Border Payment Software Engineering Intern

Tencent

(On-Site)
3 Months ago
version 1 - Onsite Support Analyst

version 1

Dublin, County Dublin, Ireland (On-Site)
2 Weeks ago
bytedance - DevOps Engineer - Applied Machine Learning, Engine

bytedance

San Jose, California, United States (On-Site)
4 Months ago
hogarth - Studio Lead (Content)

hogarth

Singapore (On-Site)
1 Month ago
Roof Stacks - Senior Platform Engineer

Roof Stacks

Istanbul, İstanbul, Türkiye (On-Site)
4 Months ago
Qualcomm - Benefits Analyst

Qualcomm

Amsterdam, North Holland, Netherlands (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

Nagarro - Senior Staff Engineer, .Net Fullstack

Nagarro

Gurugram, Haryana, India (On-Site)
8 Months ago
Paytm - Senior Associate Key Account Manager Premium - Enterprise Mid Market

Paytm

Madurai, Tamil Nadu, India (On-Site)
2 Weeks ago
Haleon - Employee Experience & Adoption Specialist

Haleon

Bengaluru, Karnataka, India (On-Site)
3 Weeks ago
Capgemini - TOSCA

Capgemini

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Cred - Funds Settlement

Cred

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Mobiloitte - Android, Kotlin Tech Lead

Mobiloitte

New Delhi, Delhi, India (Remote)
1 Month ago
Nagarro - Staff Consultant ,SAP Analytics Data Manageme

Nagarro

Gurugram, Haryana, India (On-Site)
8 Months ago
Luxoft - Murex Front Office Developer

Luxoft

Hyderabad, Telangana, India (On-Site)
7 Months ago
Barracuda - Director, Product Management

Barracuda

Bengaluru, Karnataka, India (On-Site)
4 Weeks ago
Capgemini - Design Quality Assurance

Capgemini

Pune, Maharashtra, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Technical Art Jobs

Tencent - Senior Technical Artist UE5

Tencent

Shenzhen, Guangdong Province, China (On-Site)
6 Months ago
Bombay Play - Technical Artist

Bombay Play

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Workato - Technical Architect

Workato

Hyderabad, Telangana, India (On-Site)
1 Month ago
Techland - Junior Technical Artist

Techland

Wrocław, Lower Silesian Voivodeship, Poland (On-Site)
4 Weeks ago
Steamroller Animation   - Technical Artist

Steamroller Animation

Mount Dora, Florida, United States (Hybrid)
1 Year ago
creative assembly - Technical Artist

creative assembly

Horsham, England, United Kingdom (On-Site)
1 Month ago
HCL Tech - Technical Architect

HCL Tech

California, United States (On-Site)
1 Month ago
Aristocrat - Senior Technical Artist

Aristocrat

Noida, Uttar Pradesh, India (Hybrid)
2 Months ago
Liquid nitro games - Technical Artist

Liquid nitro games

Hyderabad, Telangana, India (On-Site)
4 Months ago
conga - Technical Architect, PS

conga

Ahmedabad, Gujarat, India (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

We are a proven, passionate bunch of disruptors. Our work is all about tapping into your potential so we can deliver the best solutions and customer experiences on the planet. Collaboration, respect, and a great work-life balance earned us the title of "Best Place to Work- Employees' Choice" by Glassdoor. Our people are smart, creative, rock stars with over 400 patents and 10,000 people years of domain expertise. Blue Yonder is the world leader in digital supply chain and omni-channel commerce fulfillment. Our intelligent, end-to-end platform enables retailers, manufacturers and logistics providers to seamlessly predict, pivot and fulfill customer demand. With Blue Yonder, you can make more automated, profitable business decisions that deliver greater growth and re-imagined customer experiences. Blue Yonder - Fulfill your Potential.™

Dallas, Texas, United States (Hybrid)

Dallas, Texas, United States (On-Site)

Dallas, Texas, United States (Hybrid)

Tokyo, Japan (On-Site)

Barcelona, Catalonia, Spain (On-Site)

Dallas, Texas, United States (Hybrid)

Monterrey, Nuevo Leon, Mexico (On-Site)

Monterrey, Nuevo Leon, Mexico (On-Site)

Las Vegas, Nevada, United States (On-Site)

Dallas, Texas, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by JDA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug