Senior Systems Engineer HPC

3 Weeks ago • 10-15 Years • Administrative • $116,100 PA - $198,440 PA

Job Summary

Job Description

Rackspace seeks a skilled HPC System Engineer to manage a flagship client's HPC infrastructure. Responsibilities include designing, implementing, maintaining, and optimizing HPC clusters; monitoring performance, identifying bottlenecks, and implementing solutions; managing user accounts and resource allocation; performing system maintenance, updates, and patching; troubleshooting hardware and software issues; participating in infrastructure upgrades and expansions; evaluating and recommending hardware/software solutions; implementing and managing storage systems and networking infrastructure; optimizing system configurations and application performance; profiling and analyzing application performance; implementing and utilizing performance monitoring tools; providing technical support and training; collaborating with researchers and scientists; documenting system configurations; assisting with day-to-day operations and ticket management; implementing and maintaining security measures; managing data backups and disaster recovery procedures.
Must have:
  • 10+ yrs exp in systems, 5+ in HPC
  • Linux OS knowledge (Rocky, Ubuntu)
  • Cluster mgmt tools (Slurm, PBS)
  • High-speed interconnects exp
  • Parallel file systems knowledge
  • Scripting (R, Python, Bash)
  • HPC hardware architecture understanding
  • Configuration management software exp
  • Linux security & shell scripting
  • Strong communication skills

Job Details

Job Summary: Rackspace seeking a highly skilled and motivated HPC System Engineer to join our team. You’ll be responsible for working directly for one of flagship clients and designing, implementing, maintaining, and optimizing their high-performance computing (HPC) infrastructure. You will work closely with researchers, scientists, and other engineers to ensure the efficient and reliable operation of the HPC systems. 

Work Location: 100% Remote. Due to this role supporting a customer in the Seattle area we prefer to hire in either PST or CST time zones.
 
Travel: There may be minimal travel to either San Antonio, TX or Seattle WA. 

Responsibilities:

    • Install, configure, and maintain HPC clusters, including hardware and software components.
    • Monitor system performance, identify bottlenecks, and implement solutions to optimize performance.
    • Manage user accounts, permissions, and resource allocation.
    • Perform regular system maintenance, updates, and patching.
    • Troubleshoot and resolve hardware and software issues in a timely manner.
    • Participate in the design and planning of HPC infrastructure upgrades and expansions.
    • Evaluate and recommend hardware and software solutions to meet evolving computational needs.
    • Implement and manage storage systems, networking infrastructure, and interconnects (e.g., InfiniBand).
    • Optimize system configurations and application performance for HPC workloads.
    • Profile and analyze application performance to identify areas for improvement.
    • Implement and utilize performance monitoring tools and techniques.
    • Provide technical support and training to HPC users.
    • Collaborate with researchers and scientists to understand their computational requirements.
    • Work closely with HPC architects and engineers to ensure that research needs are met.
    • Document system configurations, procedures, and best practices.
    • Assist HPC engineers and architects with day-to-day operations and ticket management.
    • Implement and maintain security measures to protect HPC infrastructure and data.
    • Ensure compliance with relevant security policies and regulations.
    • Manage data backups and disaster recovery procedures.

Qualifications:

    • Bachelor's degree in computer science, engineering, or a related field.  Experience may substitute for the degree.
    • Minimum of 10 yrs experience working with systems; 5yrs specifically with HPC.
    • Strong knowledge of Linux operating systems (e.g., Rocky, Ubuntu).
    • Experience with cluster management tools (e.g., Slurm, PBS).
    • Familiarity with high-speed interconnects (e.g., InfiniBand, Ethernet).
    • Knowledge of parallel file systems (e.g., Lustre, SEPH, GPFS).
    • Proficiency in scripting languages (e.g., R, Python, Bash).
    • Understanding of HPC hardware architectures and technologies (e.g., CPUs, GPUs, memory).
    • Strong demonstrated experience with a major configuration management software (e.g. Terraform, Ansible), including application packaging and installation.
    • Must have strong knowledge of Linux security and Linux shell scripting.
    • Strong communication and interpersonal skills.
    • Knowledge of data transfer protocols and large-scale storage solutions.
The following information is required by pay transparency legislation in the following states: CA, CO, HI, NY, and WA. This information applies only to individuals working in these states.
 
·       The anticipated starting pay range for Colorado is: $116,100 - $170-280.
·       The anticipated starting pay range for the states of Hawaii and New York (not including NYC) is: $123,600 - $181,280.
·       The anticipated starting pay range for California, New York City and Washington is: $135,300 - $198,440.
 
Unless already included in the posted pay range and based on eligibility, the role may include variable compensation in the form of bonus, commissions, or other discretionary payments. These discretionary payments are based on company and/or individual performance and may change at any time. Actual compensation is influenced by a wide array of factors including but not limited to skill set, level of experience, licenses and certifications, and specific work location. #LI-MF1 


About Rackspace Technology
We are the multicloud solutions experts. We combine our expertise with the world’s leading technologies — across applications, data and security — to deliver end-to-end solutions. We have a proven record of advising customers based on their business challenges, designing solutions that scale, building and managing those solutions, and optimizing returns into the future. Named a best place to work, year after year according to Fortune, Forbes and Glassdoor, we attract and develop world-class talent. Join us on our mission to embrace technology, empower customers and deliver the future.
 
 
More on Rackspace Technology
Though we’re all different, Rackers thrive through our connection to a central goal: to be a valued member of a winning team on an inspiring mission. We bring our whole selves to work every day. And we embrace the notion that unique perspectives fuel innovation and enable us to best serve our customers and communities around the globe. We welcome you to apply today and want you to know that we are committed to offering equal employment opportunity without regard to age, color, disability, gender reassignment or identity or expression, genetic information, marital or civil partner status, pregnancy or maternity status, military or veteran status, nationality, ethnic or national origin, race, religion or belief, sexual orientation, or any legally protected characteristic. If you have a disability or special need that requires accommodation, please let us know.

Similar Jobs

ByteDance - Cloud Technical Support

ByteDance

Singapore (On-Site)
2 Weeks ago
NVIDIA - Senior SRAM Engineer, Circuit Design

NVIDIA

Canada (Hybrid)
2 Months ago
Google - Network Operations Residency Program

Google

Bengaluru, Karnataka, India (On-Site)
2 Days ago
Gigamon - Sr Hardware Design Engineer

Gigamon

Chennai, Tamil Nadu, India (Hybrid)
2 Months ago
The Walt Disney Company - Agent(e) de Sécurité F/H/NB - CDI

The Walt Disney Company

Île-de-France, France (On-Site)
3 Months ago
Anavation - Atlassian Subject Matter Expert

Anavation

Colorado Springs, Colorado, United States (Remote)
1 Week ago
Netflix - Administrative Assistant - Commerce Engineering

Netflix

Los Gatos, California, United States (On-Site)
1 Week ago
Hawk Eye Innovations - Football Systems Operator

Hawk Eye Innovations

Curitiba, State Of Paraná, Brazil (On-Site)
1 Week ago
Google - Database Engineer

Google

Fremont, California, United States (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

GoDaddy - Software Development Engineer in Test

GoDaddy

Pune, Maharashtra, India (Hybrid)
7 Hours ago
CloudLinux - Senior Go Developer

CloudLinux

Masovian Voivodeship, Poland (Remote)
1 Month ago
Forescout - Professional Services Engineer

Forescout

Milan, Lombardy, Italy (On-Site)
2 Months ago
Meta - Software Engineer, Machine Learning

Meta

London, England, United Kingdom (On-Site)
5 Months ago
InnoPhase IoT - Design Verification Lead

InnoPhase IoT

Bengaluru, Karnataka, India (On-Site)
1 Day ago
Info Stretch - Java Support Software Engineer

Info Stretch

Mexico (On-Site)
6 Months ago
NVIDIA - CAD Layout Design Engineer

NVIDIA

Bengaluru, Karnataka, India (Hybrid)
1 Week ago
Interactive Brokers - Senior Python Developer – Compliance Technology

Interactive Brokers

Mumbai, Maharashtra, India (Hybrid)
6 Months ago
Next Level Business Services - Sr. Big Data Engineer in San Francisco, CA  / McLean, VA

Next Level Business Services

San Francisco, California, United States (On-Site)
6 Months ago
Microsoft - Senior Software Engineer

Microsoft

Vancouver, British Columbia, Canada (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

Jobs in United States

That's No Moon - Senior Narrative Gameplay Animator (Project Hire)

That's No Moon

Los Angeles, California, United States (Remote)
1 Month ago
Hudl - Tax Manager

Hudl

Lincoln, Nebraska, United States (Hybrid)
1 Day ago
Netflix - Director, Product Management (Demand Connectivity) - Ads

Netflix

Los Gatos, California, United States (On-Site)
2 Weeks ago
Penumbra - Vascular Clinical Specialist (Southern CT and Hudson Valley, NY)

Penumbra

Home, Washington, United States (Remote)
6 Months ago
Critical mass - Project Manager

Critical mass

New York, United States (On-Site)
7 Hours ago
Nagarro - Associate Engineer

Nagarro

Atlanta, Georgia, United States (On-Site)
6 Months ago
Google - Senior Staff Data Scientist, Product

Google

Sunnyvale, California, United States (On-Site)
2 Weeks ago
Google - Security Analyst, Product Security Engineering, Cloud CISO

Google

Sunnyvale, California, United States (On-Site)
2 Weeks ago
Google - Software Engineer III, YouTube Ads Infrastructure

Google

Kirkland, Washington, United States (On-Site)
2 Days ago
Scientific Games  - General Manager, iLottery

Scientific Games

Pennsylvania, United States (Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Administrative Jobs

Mattel  Inc  - Accounting Administrator

Mattel Inc

Foshan, Guangdong Province, China (On-Site)
4 Months ago
ByteDance - Administration Policy and Planning

ByteDance

Dubai, Dubai, United Arab Emirates (On-Site)
2 Weeks ago
Next Level Business Services - SAP QM

Next Level Business Services

St. Louis, Missouri, United States (On-Site)
6 Months ago
Intrepid Studios,  Inc  - Helpdesk Support Technician

Intrepid Studios, Inc

Canada (On-Site)
8 Months ago
Tesla - Outbound Employee

Tesla

Prüm, Rhineland-Palatinate, Germany (On-Site)
2 Months ago
On Location - Executive Assistant

On Location

New York, New York, United States (On-Site)
1 Month ago
Keywords Studios - Procurement Specialist

Keywords Studios

Pasig, Metro Manila, Philippines (Hybrid)
2 Weeks ago
The Walt Disney Company - Security Officer (Part-Time)

The Walt Disney Company

Burbank, California, United States (On-Site)
2 Weeks ago
Scientific Games  - Field Service Technician I

Scientific Games

Kansas, United States (On-Site)
1 Month ago
Techland - CEO Personal Assistant

Techland

Warsaw, Masovian Voivodeship, Poland (On-Site)
3 Weeks ago

Get notifed when new similar jobs are uploaded