HPC Data Center Operator

1 Month ago • All levels

Job Summary

Job Description

The HPC Data Center Operator will monitor, diagnose, and repair system faults on high-performance computer (HPC) systems, storage systems, and networks. Responsibilities include providing technical support, troubleshooting software, hardware, and networks, documenting issues, and resolving customer issues. The operator will also perform data center facilities monitoring, participate in system decommission processes, and promote the use of inter-departmental resources. This role involves working in the Owl Shift (12:00 AM-8:00 AM) and interacting with other staff to provide advanced technical support in a complex HPC environment.
Must have:
  • Monitor and diagnose system faults on HPC systems.
  • Troubleshoot software, hardware, and networks.

Job Details

Do you love High Performance Computing (HPC)?  Would you like to work with four of the fastest HPC systems in the world?

We are looking for an HPC Data Center Operator to monitor, diagnose, and repair system faults on a large number of high-performance computer (HPC) systems, storage systems and networks. You will interact with other Livermore Computing (LC) staff to remediate problems and provide advanced technical support in a complicated HPC computing and networking environment, working Owl Shift (12:00am-8:00am). This position is in the Livermore Computing Operations Group in the LC Division within the Computing Directorate.

This position will be filled at either the 525.2 or 525.3 level depending on your qualifications. Additional responsibilities (outlined below) will be assigned if you are selected at the higher level.

You will 

  • Provide broad technical support and monitoring capabilities for the HPC systems, file systems, and storage systems under minimal supervision.
  • Apply Unix system knowledge along with using a variety of in-house and vendor supplied diagnostic tools to monitor and effect basic system repairs.
  • Troubleshoot moderately complex software, hardware & networks. Document issues, apply corrective action and repairs to the problem, or notify the appropriate on-call personnel.
  • Receive, document, and accommodate all customer calls, particularly during off-hours, and resolve customer issues if possible, or escalate to the appropriate level.
  • Perform data center facilities monitoring, problem remediation, and emergency event response during normal daily operation and off-hours.
  • Participate in the decommission process of older HPC systems & system relocation activities.
  • Promote the use of inter-departmental resources for tools, metrics, and common solutions to team members via email and presentations.
  • Perform a variety of technical tasks including installation, diagnosis, repair and maintenance of clustered computer systems and related file systems and networks. 
  • Perform other duties as assigned.

Additional job responsibilities, at the 525.3

  • Act as escalation resource for advanced technical issues
  • Be a change agent to improve current processes and procedures
  • Act as subject matter expert on specific hardware or procedures

Similar Jobs

Meta - Production Engineering

Meta

Bellevue, Washington, United States (On-Site)
7 Months ago
HCL Tech - Senior specialist

HCL Tech

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Next Level Business Services - Oracle DBA (With SAP Experience)

Next Level Business Services

Atlanta, Georgia, United States (On-Site)
7 Months ago
Anavation - Network Security Engineer (SME)

Anavation

Clarksburg, West Virginia, United States (Remote)
2 Months ago
Nintendo - Contract - Sr Engineer, Cloud (NTD)

Nintendo

Redmond, Washington, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Forescout - Professional Services Engineer

Forescout

Milan, Lombardy, Italy (On-Site)
4 Months ago
Google - Software Engineer, Site Reliability Engineering

Google

Warsaw, Masovian Voivodeship, Poland (On-Site)
1 Month ago
Games 24x7 - SDET-2 (Backend Tester)

Games 24x7

Bengaluru, Karnataka, India (On-Site)
1 Month ago
London stock Exchange - Manager Software Developer in Test

London stock Exchange

Colombo, Western Province, Sri Lanka (On-Site)
1 Month ago
ION - Technical Support Analyst, Chicago - 5849/9555

ION

Chicago, Illinois, United States (On-Site)
8 Months ago
Google - Cloud Technical Solutions Engineer, Infrastructure

Google

Tokyo, Japan (On-Site)
1 Month ago
Nasdaq - Technical Onboarding Specialist

Nasdaq

Bengaluru, Karnataka, India (Hybrid)
1 Month ago
sagecor - SIGINT Software Engineer 3 (QKS - 027)

sagecor

Fort Meade, Maryland, United States (On-Site)
2 Months ago
ByteDance - SRE and DevOps Tech Lead - Edge Cloud Infrastructure - London

ByteDance

London, England, United Kingdom (On-Site)
6 Months ago
Next Level Business Services - Sr. Big Data Engineer in San Francisco, CA  / McLean, VA

Next Level Business Services

San Francisco, California, United States (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Livermore, California, United States

Life church - Support Operations Specialist

Life church

Edmond, Oklahoma, United States (On-Site)
8 Months ago
luxsoft - IT Asset Management Specialist (Warehouse Operations)

luxsoft

San Antonio, Texas, United States (On-Site)
1 Month ago
Axon - Senior Application Security Engineer

Axon

Seattle, Washington, United States (Hybrid)
1 Month ago
People Can Fly - Live Operations Technician

People Can Fly

New York, United States (On-Site)
3 Months ago
Scientific Games  - Marketing Manager – CRM, Affiliates, and Promotions

Scientific Games

Pennsylvania, United States (Remote)
1 Month ago
Electronic Arts - Senior Lifecycle Planner, Battlefield

Electronic Arts

Los Angeles, California, United States (On-Site)
2 Months ago
Visa Jobs - Site Reliability Engineer

Visa Jobs

Ashburn, Virginia, United States (Hybrid)
1 Month ago
Mashgin - Deployment Engineer - Texas

Mashgin

Houston, Texas, United States (Remote)
7 Months ago
Fancandy - IT - SYSTEM ADMINISTRATOR

Fancandy

Boulder, Colorado, United States (On-Site)
1 Month ago
Riot Games - Sr. Manager, Technical Program Management

Riot Games

Los Angeles, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Tracy, California, United States (On-Site)

Livermore, California, United States (On-Site)

Livermore, California, United States (Hybrid)

Tracy, California, United States (On-Site)

Livermore, California, United States (On-Site)

North Las Vegas, Nevada, United States (On-Site)

Livermore, California, United States (Hybrid)

Livermore, California, United States (On-Site)

Livermore, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by LLNL

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug