About Us
Aeries Technology is a Nasdaq listed global professional services and consulting partner, headquartered in Mumbai, India, with centers in the USA, Mexico, Singapore, and Dubai. We provide mid-size technology companies with the right mix of deep vertical specialty, functional expertise, and the right systems & solutions to scale, optimize and transform their business operations with unique customized engagement models. Aeries is Great Place to Work certified by GPTW India, reflecting our commitment to fostering a positive and inclusive workplace culture for our employees. Read about us at https://aeriestechnology.com
About Business Unit
"Constant Contact is a technology product company, headquartered in Waltham, Massachusetts, United States. We are one of the top 2 providers of email marketing, social media marketing, event marketing, and online survey tools. We support 0.5 million SMBs to grow their businesses by building stronger relationships with their customers, with a wide range of intuitive marketing applications designed to help small businesses and nonprofits expand their customer bases and nurture relationships. Read about us at https://www.constantcontact.com/about In 2021, Constant Contact partnered with Aeries to set up its GTC with an aim of consolidating the former’s global operations in Bengaluru (Bangalore), India; with teams set up in the areas of IT, Engineering, Customer Support, and other General and Administrative functions. The GTC is a dedicated center, focused on providing best practices, research, support, and training for specific business functions." Big Reasons to Support Small - https://constantcontact.wistia.com/medias/pmlrsyb6hu
Roles and Responsibility
At Constant Contact, we’re looking for individuals well rounded in several aspects of Technical Operations. You will be taking on the role of a responder to the Operational alerts and monitoring within Constant Contact. This role requires you to work with both Developers and Operational personnel to address and resolve issues and requests.
We are looking for a highly skilled and motivated Site Reliability Engineer to join our team. The successful candidate will be responsible for maintaining the reliability and uptime of critical services, with a focus on CentOS servers, Java application support, incident management, change management and Kubernetes administration.
The ideal candidate will possess strong ArgoCD for Kubernetes management, Linux skills, basic scripting knowledge and familiarity with modern monitoring, alerting and automation tools. We are looking for someone that is self-motivated, possesses excellent communication skills (both oral and written) and is able to work both independently and collaboratively.
What you’ll do:
Conduct regular routine tasks for system and application maintenance. Follow SOP's to correct/prevent issues
Monitor production systems, applications and overall performance.
Observability is a process that prepares the software team for uncertainties when the software goes live for end users.
Site reliability engineering uses tools to detect abnormal behaviors in the software and, more importantly, collect information that helps developers understand what causes the problem.
Conduct security checks
Run meetings with our business partners following in place processes and procedures.
Writing, updating and maintaining policy and procedure documents
Write scripts or code as necessary to develop tools and/or services in order to support the product
Learn from Post Mortems and prevent new incidents from occurring
Performing admin work on various tools and applications such as JIRA and New Relic
Maintain Service-level objectives, specific and quantifiable goals related to maintaining the parameters set for our “Golden Metrics”.
Who you are:
3-5+ years of experience working in a SaaS and Cloud environment.
Administer Kubernetes clusters, including management of applications using ArgoCD.
Monitor, maintain, and manage applications on CentOS servers, ensuring high availability and performance.
Respond to and manage running incidents, including running post mortem meetings, peforming root cause analysis and ensuring timely resolution.
Use basic Linux scripting to automate routine tasks and improve operational efficiency.
Knowledge in Project Management Tools like JIRA/Confluence
Knowledge of Database systems like MySQL and DB2
Understand and drive incidents using Incident Management processes and procedures
Execute change management procedures, run change management meetings and enforce safe and compliant changes to production environments.
Experience as a Linux (CentOS / RHEL) administrator
Deep knowledge of on-call responsibilities and awareness of time management. Include maintaining On-call management tools such as xMatters software.
Experience with managing deployments using Jenkins
Working with a suite of monitoring tools including New Relic, Splunk and Nagios
Experience with log aggregation tools like Splunk, Loki or Grafana
You must be comfortable troubleshooting and debugging web applications across the entire stack (i.e. the application layer, the database layer, the OS).
Production MySQL experience: replication, performance tuning, query optimization.
You should have familiarity with Ansible or other configuration management tools like Puppet.