Tech Lead- Cloud Reliability Engineering

11 Minutes ago • All levels
Devops

Job Description

The Mendix Cloud Reliability Engineering team is responsible for delivering and supporting a high-quality, highly available public cloud platform for tens of thousands of mission-critical customer applications. This role involves automating operations, solving customer issues, providing on-call support, and developing monitoring and alerting systems to ensure platform performance and availability. The ideal candidate will have strong coding skills, experience with cloud infrastructure, containerization, and a passion for investigating complex distributed systems.
Good To Have:
  • Experience with Golang.
  • Experience with the Mendix low-code platform!
Must Have:
  • Writing software/scripts to automate operations on the platform, reducing support requests and engineering time.
  • Solving operational issues for customers, by both investigating technically and liaising between Mendix Support (1st line) and other development teams in R&D.
  • Providing out of hours support for critical customer issues on an on-call basis.
  • Creating and maintaining monitoring & alerting systems to provide real-time visibility into the performance and availability of the platform (SRE).
  • Developing and maintaining dashboards & reports to track key performance indicators and identify trends and issues.
  • Experience with Site Reliability Engineering (SRE).
  • Coding skills, ideally in Python.
  • Good knowledge of infrastructure (AWS).
  • Experience with Infrastructure as Code (IaC), preferably Terraform or OpenTofu.
  • Strong experience with containerization technologies, primarily Kubernetes.
  • Comfortable writing a Python script to automate complex tasks to reduce manual effort.
  • Excellent communication and people skills, both written and verbal.
  • Ability to spearhead, manage and explain complex technical issues and reduce them to a form that less technical customers & colleagues can understand.
  • A deep understanding of Cloud architecture/deployment and infrastructure services like web servers, load balancing, SSL/TLS/X509, etc.
  • Experience with monitoring and logging tools such as CloudWatch, ELK, Grafana, Datadog or Prometheus.
  • Proven experience administering, developing against, or architecting on a cloud platform.
  • Strong experience with containers and Linux/Unix systems.
  • Familiarity with SQL/databases (we primarily use PostgreSQL).
  • A passion for investigating complex issues and finding out the solution in a platform with many distributed applications.

Add these skills to join the top 1% applicants for this job

team-management
saas-business-models
communication
game-texts
postgresql
linux
aws
load-balancing
azure
unix
prometheus
terraform
new-relic
grafana
elk
kubernetes
python
splunk
sql

Mendix – the leading low-code application development platform:

The Mendix Platform uses visual modeling to abstract long-form coding out of application development. Our customers use Mendix to create and deploy better software for the enterprise, faster. Mendix enables collaboration between business users and developers to work together throughout the development process.

Read our Customer Stories to learn more about the wealth of software and solutions global organizations have built with the Platform.

At Mendix we strive to maintain a diverse, open, and safe working environment where people can be their true selves. We value every voice, celebrate individuality, and appreciate the diversity of thought and experience. People who work here are driven, smart, and really good at what they do.

As this market evolves, we encourage people of all skill levels to work with the platform, both for clients and candidates. Apply today to discover how you can make a meaningful impact with Mendix.

About the Team:

If you are an experienced developer and want to make a difference for tens of thousands of developers in our community, we have an opportunity for you!

As a company, we’re in constant communication with our community, but we need your field experience to improve our product and take it to the next level. As the backbone for countless enterprises, the Mendix Cloud hosts tens of thousands of mission-critical customer applications. These include vital systems for insurance, comprehensive supply chain optimization, advanced real estate solutions, AI-driven decision-making tools, enterprise SaaS platforms, and sophisticated industrial automation, all relying on our cloud for unparalleled reliability, performance, and the agility to integrate next-generation technologies.

The team is responsible for delivering and supporting a high quality, highly available public cloud platform where our customers can run their Mendix apps. We develop and run the Mendix Cloud infrastructure and services that offer deployment, operations and monitoring.

About the Role:

You'll help drive digital innovation by:

  • Writing software/scripts to automate operations on our platform, reducing support requests and engineering time.
  • Solving operational issues for our customers, by both investigating technically and liaising between Mendix Support (1st line) and other development teams in R&D.
  • Providing out of hours support for critical customer issues on an on-call basis.
  • Creating and maintaining monitoring & alerting systems to provide real-time visibility into the performance and availability of the platform (SRE).
  • Developing and maintaining dashboards & reports to track key performance indicators and identify trends and issues.

You’re the innovator we need if:

  • You have experience with Site Reliability Engineering (SRE).
  • You have coding skills, ideally in Python; it’s a plus if you also have experience with Golang.
  • You have good knowledge of infrastructure (AWS).
  • You have experience with Infrastructure as Code (IaC), preferably Terraform or OpenTofu.
  • You have strong experience with containerization technologies, primarily Kubernetes.
  • You're comfortable writing a Python script to automate complex tasks to reduce manual effort.
  • You have excellent communication and people skills, both written and verbal.
  • You have the ability to spearhead, manage and explain complex technical issues and reduce them to a form that less technical customers & colleagues can understand.
  • A deep understanding of Cloud architecture/deployment and infrastructure services like web servers, load balancing, SSL/TLS/X509, etc.
  • You have experience with monitoring and logging tools such as CloudWatch, ELK, Grafana, Datadog or Prometheus.
  • We use Datadog, Prometheus, CloudWatch, PagerDuty and Grafana internally, but experience with other tools such as Splunk or New Relic is welcome too.
  • You have proven experience administering, developing against, or architecting on a cloud platform. AWS is the platform we use, but experience on GCP or Azure is OK too.
  • You have strong experience with containers and Linux/Unix systems.
  • You are familiar with SQL/databases (we primarily use PostgreSQL).
  • A passion for investigating complex issues and finding out the solution in a platform with many distributed applications.

It's nice (but not essential) if:

  • You have experience with the Mendix low-code platform!

#LI-LB1

If you see a job description and think, “I’d be perfect for that” but your experience doesn’t align perfectly with the qualifications – don’t let that hold you back. We’re always eager to hire talented, passionate candidates – so give it a try and apply.

Set alerts for more jobs like Tech Lead- Cloud Reliability Engineering
Set alerts for new jobs by Mendix
Set alerts for new Devops jobs in Netherlands
Set alerts for new jobs in Netherlands
Set alerts for Devops (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙