HPC Data Center Operator

4 Hours ago • All levels

Job Summary

Job Description

The HPC Data Center Operator will monitor, diagnose, and repair system faults on high-performance computer (HPC) systems, storage systems, and networks. Responsibilities include providing technical support, troubleshooting software, hardware, and networks, documenting issues, and resolving customer issues. The operator will also perform data center facilities monitoring, participate in system decommission processes, and promote the use of inter-departmental resources. This role involves working in the Owl Shift (12:00 AM-8:00 AM) and interacting with other staff to provide advanced technical support in a complex HPC environment.
Must have:
  • Monitor and diagnose system faults on HPC systems.
  • Troubleshoot software, hardware, and networks.

Job Details

Do you love High Performance Computing (HPC)?  Would you like to work with four of the fastest HPC systems in the world?

We are looking for an HPC Data Center Operator to monitor, diagnose, and repair system faults on a large number of high-performance computer (HPC) systems, storage systems and networks. You will interact with other Livermore Computing (LC) staff to remediate problems and provide advanced technical support in a complicated HPC computing and networking environment, working Owl Shift (12:00am-8:00am). This position is in the Livermore Computing Operations Group in the LC Division within the Computing Directorate.

This position will be filled at either the 525.2 or 525.3 level depending on your qualifications. Additional responsibilities (outlined below) will be assigned if you are selected at the higher level.

You will 

  • Provide broad technical support and monitoring capabilities for the HPC systems, file systems, and storage systems under minimal supervision.
  • Apply Unix system knowledge along with using a variety of in-house and vendor supplied diagnostic tools to monitor and effect basic system repairs.
  • Troubleshoot moderately complex software, hardware & networks. Document issues, apply corrective action and repairs to the problem, or notify the appropriate on-call personnel.
  • Receive, document, and accommodate all customer calls, particularly during off-hours, and resolve customer issues if possible, or escalate to the appropriate level.
  • Perform data center facilities monitoring, problem remediation, and emergency event response during normal daily operation and off-hours.
  • Participate in the decommission process of older HPC systems & system relocation activities.
  • Promote the use of inter-departmental resources for tools, metrics, and common solutions to team members via email and presentations.
  • Perform a variety of technical tasks including installation, diagnosis, repair and maintenance of clustered computer systems and related file systems and networks. 
  • Perform other duties as assigned.

Additional job responsibilities, at the 525.3

  • Act as escalation resource for advanced technical issues
  • Be a change agent to improve current processes and procedures
  • Act as subject matter expert on specific hardware or procedures

Similar Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Skill Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Jobs in Livermore, California, United States

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Livermore, California, United States (On-Site)

Livermore, California, United States (On-Site)

Livermore, California, United States (On-Site)

Livermore, California, United States (On-Site)

Livermore, California, United States (Hybrid)

Livermore, California, United States (On-Site)

Livermore, California, United States (On-Site)

Livermore, California, United States (On-Site)

Tracy, California, United States (On-Site)

Livermore, California, United States (On-Site)

View All Jobs

Get notified when new jobs are added by llnl

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug