Engineering Farm Engineer

3 Months ago • 8 Years + • DevOps

Job Summary

Job Description

The Engineering Farm Engineer at NVIDIA is responsible for architecting and maintaining solutions for large compute clusters, ensuring efficiency and a positive user experience for both customers and engineers. This involves scaling systems through automation, performance tuning, and proactive identification of potential outages. Responsibilities include monitoring system health, handling production issues, conducting blameless postmortems, and designing solutions using efficient algorithms and a standard SDLC. The role requires expertise in software design, data structures, and at least one of Python, Perl, Go, or Ruby, along with experience in mentoring junior engineers and supporting large-scale server infrastructure.
Must have:
  • 8+ years experience in CS or related field
  • SW Design, Algorithms, Data Structures
  • Python/Perl/Go/Ruby experience
  • SQL & NoSQL database knowledge
  • System problem-solving skills
  • Debugging & automation skills
Good to have:
  • Experience with LSF and SLURM schedulers
  • Linux administration or automation
  • Experience leading projects from inception to completion

Job Details

For two decades, we have pioneered visual computing, the art and science of computer graphics. With our invention of the GPU - the engine of modern visual computing - the field has expanded to encompass video games, movie production, product design, medical diagnosis, and scientific research. Today, we stand at the beginning of the next era, the AI computing era, ignited by a new computing model, GPU deep learning. This new model - where deep neural networks are trained to recognize patterns from massive amounts of data - has shown to be deeply effective at solving some of the most complex problems in everyday life.

Engineering Farm Engineer is responsible for architecting solutions around our large compute cluster to make it work efficiently and improve the user experience for customers as well as engineers supporting the cluster.  Much of our SW engineering work focuses on eliminating manual work through automation, performance tuning, and growing the efficiency of production systems. Practices such as limiting time spent on reactive operational work, blameless postmortems, and proactive identification of potential outages factor into iterative improvement that is key to product quality and interesting and dynamic day-to-day work.  We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow.

What you will be doing:

  • Maintain server infrastructure and services once they are live by measuring and monitoring availability, latency, and overall system health.

  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

  • Work with different SMEs and help provide quality resolution to the production issues to the customer

  • Practice sustainable incident response and blameless postmortems.

  • Understand complex and vast infrastructure and support it during on-call weeks

  • Independently Architect and design solutions with SW engineering approach using the right and efficient algorithms, implemented with regular SDLC process that includes requirements gathering, SW design, testing, deployment, & release.  

  • Support large-scale server infrastructure with monitoring, logging, and alerting with promised uptime.

  • Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation, and refinement.

  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity management, and launch reviews.

What we need to see:

  • BS degree with 8+ years of experience in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.

  • Experience with SW Design, Algorithms, data structures, and software design.

  • Experience in one or more of the following: Python, Perl, Go, or Ruby using an Object-oriented approach.

  • Experience in mentoring junior engineers or leading a team.

  • Basic understanding of SQL & NoSQL Data platforms, database queries, and data analysis.

  • Interest in crafting, analyzing, and fixing large-scale distributed systems.

  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.

  • Ability to debug and optimize code and automate routine tasks.

  • Ability to learn quickly and adapt to different platforms as per the needs of the project.

Ways to stand out of the crowd:

  • Demonstrated experience with architecting and building scalable and maintainable tools following SW best practices

  • Demonstrated experience with leading a project from inception to completion along with significant independent contribution

  • Good hands-on experience with schedulers like LSF and SLURM

  • Good understanding of Linux Administration or done automation around it

  • Experience in debugging infrastructure or UNIX-related issues

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

TRAGsoft - Game Systems Designer (Coromon: Rogue Planet)

TRAGsoft

(Remote)
7 Months ago
Gallagher - Data Scientist

Gallagher

Bengaluru, Karnataka, India (On-Site)
5 Months ago
Netflix - Software Engineer - Animation & Rigging

Netflix

Sydney, New South Wales, Australia (Hybrid)
1 Month ago
Definitive Healthcare - Data Scientist/Sr. Data Scientist-Data Science-AA&I

Definitive Healthcare

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Scopely - Senior Security IAM Engineer

Scopely

Barcelona, Catalonia, Spain (Hybrid)
1 Month ago
King - Site Reliability Engineer | Core Platform

King

(On-Site)
2 Months ago
Rackspace Technology - Software Developer III (Python with Linux Automation)

Rackspace Technology

India (Remote)
4 Months ago
LSEG (London Stock Exchange Group) - Technical Design Authority

LSEG (London Stock Exchange Group)

Bengaluru, Karnataka, India (Hybrid)
7 Months ago
Nagarro - Senior Staff Engineer (Cloud Infrastructure)

Nagarro

Bengaluru, Karnataka, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Logitech - Electroacoustic Trainee

Logitech

Suzhou, Jiangsu, China (On-Site)
6 Months ago
Sleeper - Senior Frontend Engineer (Mobile)

Sleeper

Las Vegas, Nevada, United States (On-Site)
8 Months ago
ByteDance - Machine Learning Engineer - AML Algorithm

ByteDance

San Jose, California, United States (On-Site)
5 Months ago
CD PROJEKT RED - Senior Gameplay Designer

CD PROJEKT RED

Warsaw, Masovian Voivodeship, Poland (On-Site)
2 Months ago
NVIDIA - Senior Deep Learning Software Engineer, cuDNN

NVIDIA

Santa Clara, California, United States (On-Site)
3 Months ago
Snowed In Studios - Advanced Software Developer - Montreal

Snowed In Studios

Quebec, Canada (Remote)
5 Months ago
Google - Software Engineer (For Women in Tech Candidates)

Google

Belo Horizonte, State Of Minas Gerais, Brazil (On-Site)
5 Months ago
Fluence - Controls Engineer (m/f/d) - German speaker

Fluence

Berlin, Berlin, Germany (Hybrid)
6 Months ago
VGW - Machine Learning Engineer

VGW

Perth, Western Australia, Australia (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in Bengaluru, Karnataka, India

Seedify - AI Product Manager

Seedify

India (Remote)
3 Months ago
PwC - IN_Manager – D365 Finance -Ms Dynamics– Advisory  - Gurgaon

PwC

Gurugram, Haryana, India (On-Site)
6 Months ago
Rackspace Technology - AWS Support Engineer L2

Rackspace Technology

Gurugram, Haryana, India (Remote)
1 Month ago
Keka HR - Head of Visual Design

Keka HR

Bengaluru, Karnataka, India (On-Site)
9 Months ago
PwC - IN-Specialist 3_Energy Regulatory_Utility Transformation_Advisory

PwC

Mumbai, Maharashtra, India (On-Site)
2 Months ago
OpenGov - Software Engineer III - Fullstack - React/Node

OpenGov

Pune, Maharashtra, India (On-Site)
6 Months ago
DNEG - Creature TD - Rigging

DNEG

Bengaluru, Karnataka, India (On-Site)
6 Months ago
bosh group india - PreSales/ Solution Architect- Enterprise

bosh group india

Karnataka, India (On-Site)
4 Months ago
Hitachi - MS-D365 CRM Technical Consultant

Hitachi

Pune, Maharashtra, India (Remote)
6 Months ago
undefined - Senior Application Security Engineer

Hyderabad, Telangana, India (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Garena - Sea Group - Infrastructure Engineer (DC Site)

Garena

Taipei City, Taiwan (On-Site)
3 Months ago
Nielsen Holdings - Senior System Administrator (Atlassian Tool)

Nielsen Holdings

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
ByteDance - IT Director - Global Payment Brazil

ByteDance

State Of São Paulo, Brazil (On-Site)
5 Months ago
Rackspace Technology - Lead Cloud Engineer

Rackspace Technology

United States (Remote)
1 Month ago
Onward Search - DevOps Engineer

Onward Search

Irvine, California, United States (Hybrid)
1 Month ago
PwC - IN_Associate_Azure Cloud Data Engineer_OneCloud _Advisory _Bangalore

PwC

Bengaluru, Karnataka, India (On-Site)
4 Months ago
PENN Interactive - Staff Software Developer, Pricing Engine

PENN Interactive

Philadelphia, Pennsylvania, United States (Hybrid)
3 Months ago
Axinous - Principal Software Development Engineer

Axinous

(Remote)
2 Months ago
Dambuster Studios - Lead Build Engineer

Dambuster Studios

Nottingham, England, United Kingdom (Hybrid)
2 Months ago
Guardian Life - TechOps Engineer

Guardian Life

Gurugram, Haryana, India (On-Site)
6 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.

Seoul, South Korea (Hybrid)

Yokne'am Illit, North District, Israel (Hybrid)

Yokne'am Illit, North District, Israel (On-Site)

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug