Senior Systems Engineer HPC

1 Month ago • 10-15 Years • Administrative • $116,100 PA - $198,440 PA

Job Summary

Job Description

Rackspace seeks a skilled HPC System Engineer to manage a flagship client's HPC infrastructure. Responsibilities include designing, implementing, maintaining, and optimizing HPC clusters; monitoring performance, identifying bottlenecks, and implementing solutions; managing user accounts and resource allocation; performing system maintenance, updates, and patching; troubleshooting hardware and software issues; participating in infrastructure upgrades and expansions; evaluating and recommending hardware/software solutions; implementing and managing storage systems and networking infrastructure; optimizing system configurations and application performance; profiling and analyzing application performance; implementing and utilizing performance monitoring tools; providing technical support and training; collaborating with researchers and scientists; documenting system configurations; assisting with day-to-day operations and ticket management; implementing and maintaining security measures; managing data backups and disaster recovery procedures.
Must have:
  • 10+ yrs exp in systems, 5+ in HPC
  • Linux OS knowledge (Rocky, Ubuntu)
  • Cluster mgmt tools (Slurm, PBS)
  • High-speed interconnects exp
  • Parallel file systems knowledge
  • Scripting (R, Python, Bash)
  • HPC hardware architecture understanding
  • Configuration management software exp
  • Linux security & shell scripting
  • Strong communication skills

Job Details

Job Summary: Rackspace seeking a highly skilled and motivated HPC System Engineer to join our team. You’ll be responsible for working directly for one of flagship clients and designing, implementing, maintaining, and optimizing their high-performance computing (HPC) infrastructure. You will work closely with researchers, scientists, and other engineers to ensure the efficient and reliable operation of the HPC systems. 

Work Location: 100% Remote. Due to this role supporting a customer in the Seattle area we prefer to hire in either PST or CST time zones.
 
Travel: There may be minimal travel to either San Antonio, TX or Seattle WA. 

Responsibilities:

    • Install, configure, and maintain HPC clusters, including hardware and software components.
    • Monitor system performance, identify bottlenecks, and implement solutions to optimize performance.
    • Manage user accounts, permissions, and resource allocation.
    • Perform regular system maintenance, updates, and patching.
    • Troubleshoot and resolve hardware and software issues in a timely manner.
    • Participate in the design and planning of HPC infrastructure upgrades and expansions.
    • Evaluate and recommend hardware and software solutions to meet evolving computational needs.
    • Implement and manage storage systems, networking infrastructure, and interconnects (e.g., InfiniBand).
    • Optimize system configurations and application performance for HPC workloads.
    • Profile and analyze application performance to identify areas for improvement.
    • Implement and utilize performance monitoring tools and techniques.
    • Provide technical support and training to HPC users.
    • Collaborate with researchers and scientists to understand their computational requirements.
    • Work closely with HPC architects and engineers to ensure that research needs are met.
    • Document system configurations, procedures, and best practices.
    • Assist HPC engineers and architects with day-to-day operations and ticket management.
    • Implement and maintain security measures to protect HPC infrastructure and data.
    • Ensure compliance with relevant security policies and regulations.
    • Manage data backups and disaster recovery procedures.

Qualifications:

    • Bachelor's degree in computer science, engineering, or a related field.  Experience may substitute for the degree.
    • Minimum of 10 yrs experience working with systems; 5yrs specifically with HPC.
    • Strong knowledge of Linux operating systems (e.g., Rocky, Ubuntu).
    • Experience with cluster management tools (e.g., Slurm, PBS).
    • Familiarity with high-speed interconnects (e.g., InfiniBand, Ethernet).
    • Knowledge of parallel file systems (e.g., Lustre, SEPH, GPFS).
    • Proficiency in scripting languages (e.g., R, Python, Bash).
    • Understanding of HPC hardware architectures and technologies (e.g., CPUs, GPUs, memory).
    • Strong demonstrated experience with a major configuration management software (e.g. Terraform, Ansible), including application packaging and installation.
    • Must have strong knowledge of Linux security and Linux shell scripting.
    • Strong communication and interpersonal skills.
    • Knowledge of data transfer protocols and large-scale storage solutions.
The following information is required by pay transparency legislation in the following states: CA, CO, HI, NY, and WA. This information applies only to individuals working in these states.
 
·       The anticipated starting pay range for Colorado is: $116,100 - $170-280.
·       The anticipated starting pay range for the states of Hawaii and New York (not including NYC) is: $123,600 - $181,280.
·       The anticipated starting pay range for California, New York City and Washington is: $135,300 - $198,440.
 
Unless already included in the posted pay range and based on eligibility, the role may include variable compensation in the form of bonus, commissions, or other discretionary payments. These discretionary payments are based on company and/or individual performance and may change at any time. Actual compensation is influenced by a wide array of factors including but not limited to skill set, level of experience, licenses and certifications, and specific work location. #LI-MF1 


About Rackspace Technology
We are the multicloud solutions experts. We combine our expertise with the world’s leading technologies — across applications, data and security — to deliver end-to-end solutions. We have a proven record of advising customers based on their business challenges, designing solutions that scale, building and managing those solutions, and optimizing returns into the future. Named a best place to work, year after year according to Fortune, Forbes and Glassdoor, we attract and develop world-class talent. Join us on our mission to embrace technology, empower customers and deliver the future.
 
 
More on Rackspace Technology
Though we’re all different, Rackers thrive through our connection to a central goal: to be a valued member of a winning team on an inspiring mission. We bring our whole selves to work every day. And we embrace the notion that unique perspectives fuel innovation and enable us to best serve our customers and communities around the globe. We welcome you to apply today and want you to know that we are committed to offering equal employment opportunity without regard to age, color, disability, gender reassignment or identity or expression, genetic information, marital or civil partner status, pregnancy or maternity status, military or veteran status, nationality, ethnic or national origin, race, religion or belief, sexual orientation, or any legally protected characteristic. If you have a disability or special need that requires accommodation, please let us know.

Similar Jobs

Thales - Sr Applications Integrator Engineer

Thales

Singapore (On-Site)
2 Weeks ago
Epic Games - Senior DevOps Programmer

Epic Games

Vancouver, British Columbia, Canada (On-Site)
1 Month ago
NVIDIA - Design Verification Engineer (RDSS Intern)

NVIDIA

Hsinchu, Hsinchu City, Taiwan (On-Site)
4 Months ago
Jane Street - Linux Engineering Manager

Jane Street

London, England, United Kingdom (On-Site)
2 Weeks ago
Thumbtack - Lead, IT Systems Engineer - AI & Automation

Thumbtack

Canada (Remote)
1 Week ago
Ubisoft - ServiceNow Developer

Ubisoft

Montreal, Quebec, Canada (On-Site)
2 Months ago
ECI - Cloud Services Engineer

ECI

Indore, Madhya Pradesh, India (On-Site)
7 Months ago
Ubisoft - Production KM Assistant

Ubisoft

Montreuil, Île-de-France, France (On-Site)
1 Month ago
Tesla - Automotive Technician/Mechatronics Technician

Tesla

Bavaria, Germany (On-Site)
3 Months ago
Globalization Partners - Benefits Specialist

Globalization Partners

(Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Google - Cloud Technical Solutions Engineer, Networking

Google

Tokyo, Japan (On-Site)
1 Month ago
Thales - IVVQ Integration Engineer

Thales

Bucharest, Bucharest, Romania (On-Site)
1 Week ago
Qualcomm - DSP Tools Automation Engineer (With expertise in Python and GIT)

Qualcomm

Bengaluru, Karnataka, India (On-Site)
1 Week ago
NVIDIA - System Software Application Engineer

NVIDIA

Taipei City, Taiwan (On-Site)
4 Months ago
Rackspace Technology - Support Data Engineer II

Rackspace Technology

(Remote)
2 Months ago
bytedance - Software Engineer, SRE - Platform Services

bytedance

Seattle, Washington, United States (On-Site)
2 Months ago
Synechron - Senior DevOps Engineer

Synechron

Chennai, Tamil Nadu, India (On-Site)
1 Week ago
PhonePe - SRE - 2 (Big Data)

PhonePe

Bengaluru, Karnataka, India (On-Site)
2 Weeks ago
Devoteam - Ingénieur Système (H/F)

Devoteam

Cesson-Sévigné, Brittany, France (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in United States

Snap Mobile INC - Account Executive

Snap Mobile INC

Charlottesville, Virginia, United States (On-Site)
1 Month ago
Penumbra - Market Access Intern

Penumbra

Alameda, California, United States (On-Site)
2 Months ago
Axon - Senior Digital Campaigns Specialist

Axon

Scottsdale, Arizona, United States (On-Site)
6 Days ago
Hasbro - Strategic Transformation Manager

Hasbro

Pawtucket, Rhode Island, United States (Hybrid)
4 Days ago
Apple - Machine Learning Engineer - Speech Translation Data

Apple

Seattle, Washington, United States (On-Site)
2 Weeks ago
Adobe - Senior Engineering Manager, Media Services & Storage

Adobe

New York, New York, United States (Remote)
2 Months ago
attentive - Senior Data Scientist/Applied Scientist, AI Products

attentive

New York, United States (Hybrid)
1 Year ago
Xsolla - Performance Coach

Xsolla

Los Angeles, California, United States (Remote)
1 Month ago
rivos - DPA Performance Modeling - Intern

rivos

Santa Clara, California, United States (On-Site)
7 Months ago
JMA - Principal Firmware Engineer

JMA

Plano, Texas, United States (On-Site)
6 Days ago

Get notifed when new similar jobs are uploaded

Administrative Jobs

Ion - Project Management Office

Ion

Italy (On-Site)
7 Months ago
Tesla - Field Service Technician Supercharging

Tesla

Reykjavík, Reykjavíkurborg, Iceland (On-Site)
3 Months ago
Nagarro - HR Generalist (m/f/d)

Nagarro

Frankfurt Am Main, Hessen, Germany (On-Site)
6 Months ago
PwC - Consultant expérimenté Contrôle Permanent - Banque | CDI | H/F

PwC

Neuilly-sur-Seine, Île-de-France, France (On-Site)
8 Months ago
Ion - Microsoft System Engineer, Italy

Ion

Italy (Hybrid)
7 Months ago
Ubisoft - IT Technician

Ubisoft

Tokyo, Japan (On-Site)
2 Months ago
Actian - Technical Writer - Bangalore

Actian

Bengaluru, Karnataka, India (On-Site)
7 Months ago
PwC - MERC Consulting - Executive Assistant - Cairo

PwC

Cairo, Cairo Governorate, Egypt (On-Site)
8 Months ago
Dream Games - Senior IT/AV Specialist

Dream Games

London, England, United Kingdom (On-Site)
3 Months ago
Trek - Service Advisor

Trek

Folsom, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Gurugram, Haryana, India (Remote)

Bengaluru, Karnataka, India (Remote)

Gurugram, Haryana, India (Remote)

Toronto, Ontario, Canada (Remote)

Jalisco, Mexico (Remote)

Gurugram, Haryana, India (Remote)

View All Jobs

Get notified when new jobs are added by Rackspace Technology

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug