Senior Systems Engineer HPC

2 Months ago • 10-15 Years • System Design • $116,100 PA - $198,440 PA

Job Summary

Job Description

Rackspace seeks a skilled HPC System Engineer to manage a flagship client's HPC infrastructure. Responsibilities include designing, implementing, maintaining, and optimizing HPC clusters; monitoring performance, identifying bottlenecks, and implementing solutions; managing user accounts and resource allocation; performing system maintenance, updates, and patching; troubleshooting hardware and software issues; participating in infrastructure upgrades and expansions; evaluating and recommending hardware/software solutions; implementing and managing storage systems and networking infrastructure; optimizing system configurations and application performance; profiling and analyzing application performance; implementing and utilizing performance monitoring tools; providing technical support and training; collaborating with researchers and scientists; documenting system configurations; assisting with day-to-day operations and ticket management; implementing and maintaining security measures; managing data backups and disaster recovery procedures.
Must have:
  • 10+ yrs exp in systems, 5+ in HPC
  • Linux OS knowledge (Rocky, Ubuntu)
  • Cluster mgmt tools (Slurm, PBS)
  • High-speed interconnects exp
  • Parallel file systems knowledge
  • Scripting (R, Python, Bash)
  • HPC hardware architecture understanding
  • Configuration management software exp
  • Linux security & shell scripting
  • Strong communication skills

Job Details

Job Summary: Rackspace seeking a highly skilled and motivated HPC System Engineer to join our team. You’ll be responsible for working directly for one of flagship clients and designing, implementing, maintaining, and optimizing their high-performance computing (HPC) infrastructure. You will work closely with researchers, scientists, and other engineers to ensure the efficient and reliable operation of the HPC systems. 

Work Location: 100% Remote. Due to this role supporting a customer in the Seattle area we prefer to hire in either PST or CST time zones.
 
Travel: There may be minimal travel to either San Antonio, TX or Seattle WA. 

Responsibilities:

    • Install, configure, and maintain HPC clusters, including hardware and software components.
    • Monitor system performance, identify bottlenecks, and implement solutions to optimize performance.
    • Manage user accounts, permissions, and resource allocation.
    • Perform regular system maintenance, updates, and patching.
    • Troubleshoot and resolve hardware and software issues in a timely manner.
    • Participate in the design and planning of HPC infrastructure upgrades and expansions.
    • Evaluate and recommend hardware and software solutions to meet evolving computational needs.
    • Implement and manage storage systems, networking infrastructure, and interconnects (e.g., InfiniBand).
    • Optimize system configurations and application performance for HPC workloads.
    • Profile and analyze application performance to identify areas for improvement.
    • Implement and utilize performance monitoring tools and techniques.
    • Provide technical support and training to HPC users.
    • Collaborate with researchers and scientists to understand their computational requirements.
    • Work closely with HPC architects and engineers to ensure that research needs are met.
    • Document system configurations, procedures, and best practices.
    • Assist HPC engineers and architects with day-to-day operations and ticket management.
    • Implement and maintain security measures to protect HPC infrastructure and data.
    • Ensure compliance with relevant security policies and regulations.
    • Manage data backups and disaster recovery procedures.

Qualifications:

    • Bachelor's degree in computer science, engineering, or a related field.  Experience may substitute for the degree.
    • Minimum of 10 yrs experience working with systems; 5yrs specifically with HPC.
    • Strong knowledge of Linux operating systems (e.g., Rocky, Ubuntu).
    • Experience with cluster management tools (e.g., Slurm, PBS).
    • Familiarity with high-speed interconnects (e.g., InfiniBand, Ethernet).
    • Knowledge of parallel file systems (e.g., Lustre, SEPH, GPFS).
    • Proficiency in scripting languages (e.g., R, Python, Bash).
    • Understanding of HPC hardware architectures and technologies (e.g., CPUs, GPUs, memory).
    • Strong demonstrated experience with a major configuration management software (e.g. Terraform, Ansible), including application packaging and installation.
    • Must have strong knowledge of Linux security and Linux shell scripting.
    • Strong communication and interpersonal skills.
    • Knowledge of data transfer protocols and large-scale storage solutions.
The following information is required by pay transparency legislation in the following states: CA, CO, HI, NY, and WA. This information applies only to individuals working in these states.
 
·       The anticipated starting pay range for Colorado is: $116,100 - $170-280.
·       The anticipated starting pay range for the states of Hawaii and New York (not including NYC) is: $123,600 - $181,280.
·       The anticipated starting pay range for California, New York City and Washington is: $135,300 - $198,440.
 
Unless already included in the posted pay range and based on eligibility, the role may include variable compensation in the form of bonus, commissions, or other discretionary payments. These discretionary payments are based on company and/or individual performance and may change at any time. Actual compensation is influenced by a wide array of factors including but not limited to skill set, level of experience, licenses and certifications, and specific work location. #LI-MF1 


About Rackspace Technology
We are the multicloud solutions experts. We combine our expertise with the world’s leading technologies — across applications, data and security — to deliver end-to-end solutions. We have a proven record of advising customers based on their business challenges, designing solutions that scale, building and managing those solutions, and optimizing returns into the future. Named a best place to work, year after year according to Fortune, Forbes and Glassdoor, we attract and develop world-class talent. Join us on our mission to embrace technology, empower customers and deliver the future.
 
 
More on Rackspace Technology
Though we’re all different, Rackers thrive through our connection to a central goal: to be a valued member of a winning team on an inspiring mission. We bring our whole selves to work every day. And we embrace the notion that unique perspectives fuel innovation and enable us to best serve our customers and communities around the globe. We welcome you to apply today and want you to know that we are committed to offering equal employment opportunity without regard to age, color, disability, gender reassignment or identity or expression, genetic information, marital or civil partner status, pregnancy or maternity status, military or veteran status, nationality, ethnic or national origin, race, religion or belief, sexual orientation, or any legally protected characteristic. If you have a disability or special need that requires accommodation, please let us know.

Similar Jobs

Scale AI - AI Product Manager, Generative AI

Scale AI

San Francisco, California, United States (On-Site)
8 Months ago
Palo Alto Networks - Customer Success Manager

Palo Alto Networks

London, England, United Kingdom (On-Site)
1 Month ago
HCL Tech - Senior Technical Lead

HCL Tech

Noida, Uttar Pradesh, India (On-Site)
1 Month ago
Scanline VFX - Project Manager, Stage Pipeline

Scanline VFX

Los Angeles, California, United States (On-Site)
1 Month ago
HCL Tech - Senior Design Lead - LabVIEW, ATE Dev

HCL Tech

California, United States (On-Site)
1 Month ago
NVIDIA - Senior Software Engineer - System Customization Team

NVIDIA

Yokne'am Illit, North District, Israel (On-Site)
5 Months ago
Qualcomm - Senior AI Camera Systems Engineer

Qualcomm

Hyderabad, Telangana, India (On-Site)
1 Month ago
extreme network - Senior Systems Engineer

extreme network

Israel (On-Site)
2 Weeks ago
adglobal 360 - System Engineer (Osaka)

adglobal 360

Osaka, Osaka, Japan (On-Site)
8 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

SimpliSafe - AI Product Manager

SimpliSafe

Boston, Massachusetts, United States (Hybrid)
1 Month ago
Eventbrite - Engineering Manager

Eventbrite

Spain (Remote)
1 Month ago
Epic Games - Senior Tools Programmer

Epic Games

Vancouver, British Columbia, Canada (On-Site)
5 Months ago
Ion - Data Engineer, Italy

Ion

Italy (Hybrid)
8 Months ago
Qualcomm - Interim Engineering Intern

Qualcomm

Hyderabad, Telangana, India (On-Site)
3 Weeks ago
luxsoft - Full-stack Java Developer

luxsoft

Ukraine (Remote)
1 Month ago
Illumina - Sr Bioinformatics Scientist

Illumina

San Diego, California, United States (Hybrid)
1 Month ago
Autodesk - Principal Machine Learning: Generative AI

Autodesk

Boston, Massachusetts, United States (Hybrid)
3 Weeks ago
NinjaVan - Internship - Office Management

NinjaVan

Subang Jaya, Selangor, Malaysia (On-Site)
8 Months ago
Thales - Senior Project Manager

Thales

Melbourne, Victoria, Australia (Hybrid)
1 Month ago

Get notifed when new similar jobs are uploaded

Jobs in United States

Apple - Business Operations Manager

Apple

Austin, Texas, United States (On-Site)
1 Week ago
MarketScale - Part Time Podcast Host

MarketScale

Dallas, Texas, United States (On-Site)
2 Months ago
Scale AI - Head of Engagement Management, Gen AI

Scale AI

San Francisco, California, United States (On-Site)
2 Months ago
hogarth - Senior Content Manager

hogarth

Sunnyvale, California, United States (Hybrid)
1 Month ago
Condé Nast - People Coordinator

Condé Nast

New York, United States (On-Site)
1 Month ago
Crowd Strick - Head of Data Platform

Crowd Strick

United States (Remote)
1 Month ago
Cognite - Industry Transformation Director - Oil & Gas, Chemical Manufacturing and Energy Sectors

Cognite

Houston, Texas, United States (Hybrid)
4 Months ago
Rippling - Director of Engineering - Machine Learning and AI

Rippling

San Francisco, California, United States (On-Site)
1 Year ago
Rippling - Product Lead, Finance Products

Rippling

San Francisco, California, United States (On-Site)
1 Month ago
bytedance - Senior Software Development Engineer, SDN-Traffic Intelligence & Control

bytedance

San Jose, California, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

System Design Jobs

bytedance - Backend Engineer, Machine Learning Systems - Singapore

bytedance

Singapore (On-Site)
8 Months ago
Luxoft - Lead Java Developer (for Trading Application)

Luxoft

Kuala Lumpur, Federal Territory Of Kuala Lumpur, Malaysia (Remote)
6 Months ago
Comscore - Systems Engineer

Comscore

Pune, Maharashtra, India (On-Site)
1 Month ago
Nintendo - Senior Systems Engineer

Nintendo

Redmond, Washington, United States (Hybrid)
1 Year ago
extreme network - Sr. SLED Systems Engineer

extreme network

Washington, United States (Remote)
7 Months ago
CharacterAI - Research Engineer, ML Systems

CharacterAI

New York, New York, United States (On-Site)
3 Months ago
FalconX - Senior Frontend Engineer (Trading Systems)

FalconX

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Qualcomm - Senior Camera Systems Engineer

Qualcomm

Hyderabad, Telangana, India (On-Site)
1 Week ago
Qualcomm - AI Model System Software Performance Optimization Engineer

Qualcomm

Hyderabad, Telangana, India (On-Site)
2 Weeks ago
Tesla - Power Systems Engineer - BESS, EMEA

Tesla

North Holland, Netherlands (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

San Antonio, Texas, United States (Remote)

Gurugram, Haryana, India (Remote)

Giza, Giza Governorate, Egypt (On-Site)

Riyadh, Riyadh Province, Saudi Arabia (On-Site)

United States (Hybrid)

Frankfurt Am Main, Hessen, Germany (On-Site)

Gurugram, Haryana, India (Hybrid)

Gurugram, Haryana, India (Hybrid)

View All Jobs

Get notified when new jobs are added by Rackspace Technology

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug