Lead Systems Engineer, High-Performance Computing

1 Month ago • 10-13 Years • System Design • $160,600 PA - $232,900 PA

Job Summary

Job Description

Visa is seeking a Lead Systems Engineer specializing in High-Performance Computing (HPC) to join their Distributed Compute Engineering (DCE) team. The role involves designing and automating server infrastructure, focusing on GPU as a Service (GaaS) and HPC platforms. Responsibilities include extensive datacenter experience, advanced technical knowledge of x86 technologies (NVME, GPU, PCI-E), enterprise server components, processor and GPU systems, UEFI, BIOS, and hardware lifecycle management. The engineer will manage compute infrastructure, utilize automation tools like Ansible, perform performance benchmarking, evaluate new technologies, and possess strong scripting skills in PowerShell and Python. The role requires both independent work and teamwork, with strong analytical and troubleshooting abilities, and the capacity to mentor junior staff. This is a hybrid position requiring 2-3 days in the office per week.
Must have:
  • Deploy, manage, and optimize GPU as a Service (GaaS) and HPC platforms.
  • Extensive datacenter experience in complex, geographically distributed IT infrastructures.
  • Advanced knowledge of x86 technologies, NVME, GPU, PCI-E.
  • Expertise in server components, processors, GPU systems, memory hierarchy, and hardware security.
  • Proficiency in out-of-band management, UEFI, and BIOS.
  • Experience in hardware lifecycle management and firmware/OS driver certifications.
  • Skilled in infrastructure management tools and Ansible for automation.
  • Ability to perform performance benchmarking and evaluate new technologies on Linux, Windows, containers, and virtualized platforms.
  • Advanced scripting skills in PowerShell and Python.
  • Strong analytical and troubleshooting abilities, with the capacity to mentor junior staff.
Good to have:
  • Experience with HP ProLiant or Dell PowerEdge server product lines.
  • Experience in system monitoring for unattended operations.
  • Engineering knowledge to troubleshoot storage issues (hosts, SAN switches, storage devices).
  • Engineering knowledge in TCPIP networking (link aggregation, switches, routing, load-balancing).
  • Ability to write technical designs, documentation, and presentations for compute infrastructure.
  • Ability to provide level 3 support and guide level 2 administrators.
Perks:
  • Medical
  • Dental
  • Vision
  • 401(k)
  • FSA/HSA
  • Life Insurance
  • Paid Time Off
  • Wellness Program
  • Potential sales incentive payments
  • Bonus and equity eligibility

Job Details

Company Description

Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure payments network, enabling individuals, businesses, and economies to thrive while driven by a common purpose – to uplift everyone, everywhere by being the best way to pay and be paid.

Make an impact with a purpose-driven industry leader. Join us today and experience Life at Visa.

Job Description

IaaS Systems and Storage & Engineering (ISSE) team is part of the Operations & Infrastructure technology organization. Distributed Compute engineering (DCE) is part of ISSE and High-performance compute platform engineering is part of DCE. Our vision, mission and purpose are summarized as following:

Vision: To become a leading technical engineering professional, pioneering in the design and automation of server infrastructure. We envision creating highly secure and efficient operations environments that drive business success and technological advancement.

Mission: Our mission is to deliver high-quality server infrastructure design and automated implementation. We are committed to operating in complex, highly secure, and highly available environments, while maintaining rigorous operations, security, and procedural models.

Purpose: The purpose of this role is to utilize strong hands-on technical engineering skills to design and automate the implementation of server infrastructure based on business requirements. This role will interact with technology domain experts to maintain high security and availability in complex operational environments, thereby driving business efficiency and security.

Essential Functions:

  • GPU as a Service and High-Performance Compute Platform Support: Expertise in deploying, managing, and optimizing GPU as a Service (GaaS) and high-performance compute platforms to support advanced computational workloads.
  • Extensive Datacenter Experience: Proficient in managing complex, geographically distributed IT infrastructures to ensure high availability and performance.
  • Advanced Technical Knowledge: Profound understanding of high-performance, highly available, and secure computing systems utilizing x86 technologies and protocols (NVME, GPU, PCI-E).
  • Enterprise Server and Component Expertise: In-depth knowledge of server components (storage/network controllers, HBA, SSDs) and their functionalities, essential for maintaining high-performance compute environments.
  • Processor and GPU Systems Proficiency: Strong grasp of Intel/AMD architectures, GPU systems, memory hierarchy, and hardware-level security to enhance system performance and reliability.
  • Out-of-Band, UEFI, and BIOS Expertise: Comprehensive understanding of out-of-band management, UEFI, BIOS settings, and their impact on system performance and security in high-performance computing environments.
  • Hardware Lifecycle Management: Experienced in hardware lifecycle management, including firmware and OS driver certifications, to ensure the longevity and reliability of compute resources.
  • Infrastructure Management and Automation: Proficient in installing, configuring, supporting, and maintaining compute infrastructure management tools, with skills in Ansible for automation to streamline deployment and operational tasks.
  • Performance Benchmarking and Tech Evaluation: Capable of running performance benchmarks and evaluating new technologies for various platforms (Linux, Windows, containerized, and virtualized) to ensure optimal performance.
  • Scripting Proficiency: Advanced skills in scripting languages such as PowerShell and Python to automate and optimize infrastructure tasks.
  • Team and Independent Work: Highly motivated, excellent team player, capable of working independently, with strong analytical and troubleshooting abilities to resolve complex issues and mentor junior staff.

This is a hybrid position. Hybrid employees can alternate time between both remote and office. Employees in hybrid roles are expected to work from the office 2-3 set days a week (determined by leadership/site), with a general guidepost of being in the office 50% or more of the time based on business needs.

Qualifications

Basic Qualifications:
• 10+ years of relevant work experience with a Bachelor’s Degree or at least 7 years of work experience with an Advanced degree (e.g. Masters, MBA, JD, MD) or 4 years of work experience with a PhD, OR 13+ years of relevant work experience.

Preferred Qualifications:
• 12 or more years of work experience with a Bachelor’s Degree or 8-10 years of experience with an Advanced Degree (e.g. Masters, MBA, JD, MD) or 6+ years of work experience with a PhD
• Bachelor's degree or higher in Computer Science, Information Systems, Computer Engineering, Electrical or other relevant engineering field.
• Broad knowledge in hardware, software, network, and applications deployments thru automation
• Hardware and infrastructure automation experience in at least one of the following server product lines - HP ProLiant, Dell PowerEdge.
• Strong technical analytical and troubleshooting skills and possess an ability to explain technical concepts and provide guidance to junior staff.
• Experience in system monitoring with tools supporting unattended operations.
• Engineering Knowledge to troubleshoot and solve storage issues (Hosts, SAN switches, and Storage Devices).
• Engineering knowledge in TCPIP networking – link aggregation redundancy, switches, routing, and load-balancing.
• Ability to write technical designs, documentation, and presentations for Compute Infrastructure.
• Ability to provide level 3 support and guide level 2 administrators on problem resolution.

Additional Information

Work Hours: Varies upon the needs of the department.

Travel Requirements: This position requires travel 5-10% of the time.

Mental/Physical Requirements: This position will be performed in an office setting.  The position will require the incumbent to sit and stand at a desk, communicate in person and by telephone, frequently operate standard office equipment, such as telephones and computers.

Visa is an EEO Employer.  Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status.  Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Visa will consider for employment qualified applicants with criminal histories in a manner consistent with applicable local law, including the requirements of Article 49 of the San Francisco Police Code.

U.S. APPLICANTS ONLY: The estimated salary range for a new hire into this position is 160,600.00 to 232,900.00 USD per year, which may include potential sales incentive payments (if applicable). Salary may vary depending on job-related factors which may include knowledge, skills, experience, and location. In addition, this position may be eligible for bonus and equity. Visa has a comprehensive benefits package for which this position may be eligible that includes Medical, Dental, Vision, 401 (k), FSA/HSA, Life Insurance, Paid Time Off, and Wellness Program.

Similar Jobs

Tesla - AV & Live Events Project Engineer

Tesla

Brandenburg, Germany (On-Site)
5 Months ago
PwC - Senior Associate in Transfer Pricing

PwC

Belgrade, Serbia (Hybrid)
2 Weeks ago
ness digital  - Lead AWS Redshift Database Administrator

ness digital

United States (Hybrid)
3 Weeks ago
ISS Stoxx - Software Engineer in Test (SDET)

ISS Stoxx

Mumbai, Maharashtra, India (On-Site)
1 Month ago
Cadence - Lead Product Engineer

Cadence

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Power Integrations - Senior Engineer, System Engineering Automotive

Power Integrations

Pasig, Metro Manila, Philippines (On-Site)
4 Months ago
Shield AI - Senior Engineer, Software Systems- Dayton (R3539)

Shield AI

Dayton, Ohio, United States (Hybrid)
1 Week ago
AECOM - Electrical Engineer – Power Systems / Federal Projects

AECOM

Roanoke, Virginia, United States (On-Site)
1 Month ago
HP - Systems/Software Engineer

HP

Taipei City, Taiwan (On-Site)
1 Week ago
Apple - Watch System Architect

Apple

Cupertino, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Expedia - Software Development Engineer II - Full Stack/iOS

Expedia

Austin, Texas, United States (On-Site)
9 Months ago
Kavalirio - Case Management Assistant

Kavalirio

Sacramento, California, United States (Remote)
2 Weeks ago
Zscaler - Senior Staff Software Engineer - API Tooling and Frameworks

Zscaler

San Jose, California, United States (On-Site)
1 Month ago
Barracuda - Cloud Site Reliability Staff Developer

Barracuda

Ottawa, Ontario, Canada (Hybrid)
3 Months ago
Apple - CoreMedia Performance Engineer

Apple

San Diego, California, United States (On-Site)
2 Weeks ago
Beyond Sports - System Admin/IT Support

Beyond Sports

Alkmaar, North Holland, Netherlands (On-Site)
4 Months ago
Epic Games - Senior DevOps Engineer

Epic Games

(On-Site)
3 Months ago
Tesla - Deskside Support Technician

Tesla

North Holland, Netherlands (On-Site)
5 Months ago
Capgemini - R&S Design Implementation

Capgemini

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Autodesk - Senior Business Development Representative, New Business

Autodesk

Denver, Colorado, United States (Hybrid)
2 Weeks ago

Get notifed when new similar jobs are uploaded

Jobs in Ashburn, Virginia, United States

Apple - Senior Software Engineer, Rights and Pricing

Apple

Seattle, Washington, United States (On-Site)
2 Months ago
PlayStation Global - Technical Product Manager II

PlayStation Global

Aliso Viejo, California, United States (Hybrid)
4 Months ago
Sierra - Software Engineer, Agent

Sierra

New York, United States (On-Site)
3 Months ago
Lilt - Customer Success Manager

Lilt

Indianapolis, Indiana, United States (Hybrid)
4 Months ago
neural concept - ML Platform Deployment Engineer

neural concept

Jersey City, New Jersey, United States (Hybrid)
2 Weeks ago
Advanced Systems Group, LLC - Audio & Visual Technical Supervisor

Advanced Systems Group, LLC

San Francisco, California, United States (On-Site)
1 Month ago
Blink - GenAI UX Researcher

Blink

Atlanta, Georgia, United States (On-Site)
1 Month ago
Zynga - Senior Data Scientist (Full Stack)

Zynga

Austin, Texas, United States (On-Site)
3 Months ago
Gusto - Health Insurance Account Executive

Gusto

Denver, Colorado, United States (Hybrid)
1 Week ago
Riot Games - Lead VFX Artist

Riot Games

Los Angeles, California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

System Design Jobs

Regent craft - Senior Systems Safety Engineer

Regent craft

North Kingstown, Rhode Island, United States (On-Site)
3 Weeks ago
Sword Health - Android Systems Engineer

Sword Health

Porto, Porto District, Portugal (On-Site)
1 Year ago
Apple - Cellular RF Transmitter Systems Engineer

Apple

Waltham, Massachusetts, United States (On-Site)
2 Months ago
Apple - iPad Hardware System Design Engineer

Apple

Austin, Texas, United States (On-Site)
1 Month ago
Lambda - Staff Storage Systems Architect

Lambda

San Francisco, California, United States (Hybrid)
1 Month ago
eBay - Senior Backend Engineer, ML Systems

eBay

Bengaluru, Karnataka, India (On-Site)
1 Week ago
Scorewarrior - Senior System Engineer

Scorewarrior

Limassol, Limassol, Cyprus (On-Site)
4 Months ago
extreme network - Senior Software Systems Engineer - WLAN Development/WIFI Protocols

extreme network

Bengaluru, Karnataka, India (Hybrid)
1 Month ago
Kavalirio - Systems Engineer III

Kavalirio

Jacksonville, Florida, United States (On-Site)
1 Month ago
Postman - Backend and Systems Engineer, Flows

Postman

New York, New York, United States (On-Site)
9 Months ago

Get notifed when new similar jobs are uploaded

About The Company

At Visa, we are driven by a common purpose – to uplift everyone, everywhere by being the best way to pay and be paid. As our products and technology have evolved with the world, Visa remains ubiquitous, reaching new customers in new and often invisible ways. We are at the center of this digital revolution with a network that connects people with over 80 million businesses all over the world.

Almaty, Almaty Region, Kazakhstan (Hybrid)

Amsterdam, North Holland, Netherlands (On-Site)

Ashburn, Virginia, United States (Hybrid)

Ashburn, Virginia, United States (Hybrid)

Ashburn, Virginia, United States (Hybrid)

Ashburn, Virginia, United States (Hybrid)

Accra, Greater Accra Region, Ghana (On-Site)

Ashburn, Virginia, United States (Hybrid)

Ashburn, Virginia, United States (Hybrid)

Ashburn, Virginia, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Visa

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug