Senior Staff Engineer, Memory Fault Management Architect

3 Months ago • 10-15 Years • $177,100 PA - $282,900 PA

Job Summary

Job Description

Senior Staff Engineer specializing in memory fault management architecture. Requires 10+ years of experience in hardware fault management, reliability, data center fleet management, and strong knowledge of platform memory subsystem. Familiarity with Linux kernel, data center operating systems, and RAS (Reliability Availability Serviceability) is essential.
Must have:
  • Hardware Fault Mgmt
  • Data Center Fleet
  • Platform Memory
  • Linux Kernel
Good to have:
  • ECC Modules
  • RAS Algorithms
  • Industry Standards
  • Predictive Algorithms
Perks:
  • Paid Time Off
  • Fertility Care

Job Details

Please Note:

To provide the best candidate experience with our high application volumes, we limit applications to a total of 10 over 6 months. 

Advancing the World’s Technology Together
Our technology solutions power the tools you use every day--including smartphones, electric vehicles, hyperscale data centers, IoT devices, and so much more. Here, you’ll have an opportunity to be part of a global leader whose innovative designs are pushing the boundaries of what’s possible and powering the future. 

We believe innovation and growth are driven by an inclusive culture and a diverse workforce. We’re dedicated to empowering people to be their true selves. Together, we’re building a better tomorrow for our employees, customers, partners, and communities.

Our technology solutions power the tools you use every day--including smartphones, electric vehicles, hyperscale data centers, IoT devices, and so much more. Here, you’ll have an opportunity to be part of a global leader whose innovative designs are pushing the boundaries of what’s possible and powering the future. 

We believe that innovation and growth are driven by an inclusive culture and a diverse workforce. We’re dedicated to empowering people to be their true selves. Together, we’re building a better tomorrow for our employees, customers, partners, and communities.

The Customer Quality & Reliability (Q&R) team is accountable to identify any major product quality and reliability issues as early as possible and help resolve them as swiftly as possible. You will be part of an incubation team within this organization working on in-field telemetry intended to transform the Customer Quality Experience for Samsung memory products. Fault Management is the future of quality to minimize system downtime within AI/ML hardware deployments and workloads of the future. We analyze trends and patterns from enormous memory fleet telemetry to bucketize failures and perform virtual root-cause analysis. Telemetry analysis helps us design solutions to proactively avoid system downtime. We conduct research and develop both in-house and collaboratively in the industry with the opportunity to publish our findings through whitepapers and conferences. We are looking for innovative and passionate thinkers who can work in a start-up environment and are excited to shape the future of data centers around the world. Join us in our mission!

What You'll Do

  • Interface with customers to establish the value add of enabling in-field fault management and mitigating systems in order to improve field failure rate of memory subsystems.
  • Deep dive into memory subsystem ECC (Error Correction Codes) modules and evaluate correction capabilities of field platforms
  • Propose and develop platform fault management modules for memory subsystems.
  • Propose and develop platform RAS (Reliability Availability Serviceability) algorithms for memory fault management.
  • Contribute to define industry standards on memory fleet telemetry along with development of sophisticated predictive algorithms to manage hardware faults.
  • Stay up-to-date on latest industry trends in the hardware fault management space. Read technical papers, blogs, conference talks as well as publish whitepapers in conferences.
  • Drive alignment with multiple stakeholders/teams within US and Korea.
  • Drive communication and interaction with internal and external customers regarding project goals and solution.

Location: Hybrid with at least 3 days in office in San Jose, CA office location remainder of time to work remotely

Job ID: 42363

 What You Bring

  • Bachelors with 15+ years of relevant industry experience, or Masters with 13+ years or PhD with 10+ years hardware fault management, reliability, data center fleet management experience or related technical field preferred.
  • Knowledge of platform memory subsystem, platform RAS (Reliability Availability Serviceability).
  • Linux kernel commit experience.
  • Familiarity with data center operating system and platform concepts (x86, ARM).
  • Project management with the ability to write, edit, clarify and maintain consolidated status in real-time.
  • Knowledge of platform memory subsystem from bare metal to OS level transactions. An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
  • Innovative and creative, you proactively explore new ideas and adapt quickly to change.
  • Strong analytical and problem-solving skills.
  • Excellent communication and interpersonal skills.
  • Ability to work independently and as part of a team.
  • You’re inclusive, adapting your style to the situation and diverse global norms of our people.
  • An avid learner, you approach challenges with curiosity and resilience, seeking data to help build understanding.
  • You’re collaborative, building relationships, humbly offering support and openly welcoming approaches.
  • Innovative and creative, you proactively explore new ideas and adapt quickly to change.

#LI-RR1

 

 

 

What We Offer
The pay range below is for all roles at this level across all US locations and functions. Individual pay rates depend on a number of factors—including the role’s function and location, as well as the individual’s knowledge, skills, experience, education, and training. We also offer incentive opportunities that reward employees based on individual and company performance. 

This is in addition to our diverse package of benefits centered around the wellbeing of our employees and their loved ones. In addition to the usual Medical/Dental/Vision/401k, our inclusive rewards plan empowers our people to care for their whole selves. An investment in your future is an investment in ours.

Give Back With a charitable giving match and frequent opportunities to get involved, we take an active role in supporting the community.
Enjoy Time Away You’ll start with 4+ weeks of paid time off a year, plus holidays and sick leave, to rest and recharge.
Care for Family Whatever family means to you, we want to support you along the way—including a stipend for fertility care or adoption, medical travel support, and an errand service.
Prioritize Emotional Wellness With on-demand apps and paid therapy sessions, you’ll have support no matter where you are.
Stay Fit Eating well and being active are important parts of a healthy life. Our onsite Café and gym, plus virtual classes, make it easier.
Embrace Flexibility Benefits are best when you have the space to use them. That’s why we facilitate a flexible environment so you can find the right balance for you.

Base Pay Range

$177,100 - $282,900 USD

Equal Opportunity Employment Policy 

Samsung Semiconductor takes pride in being an equal opportunity workplace dedicated to fostering an environment where all individuals feel valued and empowered to excel, regardless of race, religion, color, age, disability, sex, gender identity, sexual orientation, ancestry, genetic information, marital status, national origin, political affiliation, or veteran status.

When selecting team members, we prioritize talent and qualities such as humility, kindness, and dedication. We extend comprehensive accommodations throughout our recruiting processes for candidates with disabilities, long-term conditions, neurodivergent individuals, or those requiring pregnancy-related support. All candidates scheduled for an interview will receive guidance on requesting accommodations.

Recruiting Agency Policy

We do not accept unsolicited resumes. Only authorized recruitment agencies that have a current and valid agreement with Samsung Semiconductor, Inc. are permitted to submit resumes for any job openings.

Covid-19 Policy
To help keep our employees, customers, and communities safe, we’ve developed guidelines for our teams. Currently, we encourage vaccination for all employees and may require it depending on job functions (e.g., traveling for business, meeting with customers). While visiting our offices or attending team events, we ask employees to complete a daily health questionnaire and complete a weekly COVID test. Our COVID policies are subject to change depending on public health, regulatory and business circumstances. 

Applicant Privacy Policy
https://semiconductor.samsung.com/us/careers/privacy

 

Similar Jobs

Google - Software Engineer III, Machine Learning, Pixel Camera

Google

New Taipei, New Taipei City, Taiwan (On-Site)
1 Month ago
ByteDance - Lark Backend Software Engineer - Procurement team

ByteDance

Dubai, Dubai, United Arab Emirates (On-Site)
3 Months ago
Google - Software Engineer, Early Career, Campus

Google

Kraków, Lesser Poland Voivodeship, Poland (On-Site)
1 Month ago
Epic Games - Senior Software Engineer, Developer Relations (Mobile)

Epic Games

(On-Site)
7 Months ago
Meta - Software Engineer (Technical Leadership)

Meta

New York, New York, United States (On-Site)
3 Months ago
Nagarro - SAP UI5 Principal Developer (m/f/d)

Nagarro

Germany (Remote)
4 Months ago
Azul - Senior Compiler Engineer

Azul

Bengaluru, Karnataka, India (Remote)
5 Months ago
Framestore - LAUNCHPAD INSIGHTS | UNPAID WORK EXPERIENCE

Framestore

London, England, United Kingdom (On-Site)
8 Months ago
Workato - Manager, Product Security

Workato

Mountain View, California, United States (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Zoox - Senior/Staff Software Engineer, Mission Planning

Zoox

Foster City, California, United States (Hybrid)
4 Months ago
One97 Communications  - IOS Developer - Software Engineer

One97 Communications

Noida, Uttar Pradesh, India (On-Site)
4 Months ago
Zynga - Principal Software Engineer (Server)- Hit It Rich!

Zynga

Austin, Texas, United States (On-Site)
2 Months ago
The Walt Disney Company - Lead Data Scientist

The Walt Disney Company

New York, New York, United States (On-Site)
2 Months ago
Google - Software Engineer, Search, Ranking

Google

State Of Minas Gerais, Brazil (On-Site)
1 Month ago
Luxoft - Audio Drivers Developer

Luxoft

(Remote)
3 Months ago
Playdead - Graphics Programmer

Playdead

Copenhagen, Denmark (On-Site)
6 Months ago
Luxoft - Support Network Engineer with Automation

Luxoft

(Remote)
3 Months ago
Mozilla - Staff Machine Learning Engineer, Gen AI

Mozilla

Belgium (Remote)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in San Jose, California, United States

Insomniacs - Senior Designer

Insomniacs

United States (Remote)
1 Month ago
Visa - Director, Onsite Technology Support

Visa

Austin, Texas, United States (Hybrid)
2 Months ago
Logitech - Regional Channel Account Manager, South West

Logitech

California, United States (Remote)
2 Months ago
Nukklear - Initiative Application

Nukklear

Dallas, Texas, United States (Remote)
6 Months ago
Patel greene - STEP Intern

Patel greene

Temple Terrace, Florida, United States (On-Site)
3 Months ago
Crunchyroll - VP of Artist and Entertainment Partnerships

Crunchyroll

Culver City, California, United States (On-Site)
3 Months ago
Valve corporation - Accounting Professional

Valve corporation

Bellevue, Washington, United States (On-Site)
3 Months ago
Thatgamecompany - General - Game Producer

Thatgamecompany

United States (On-Site)
8 Months ago
Autodesk - Software Engineer - FEA Meshing

Autodesk

Novi, Michigan, United States (On-Site)
4 Months ago
The Walt Disney Company - Animation Technician Intern - Summer 2025

The Walt Disney Company

California, United States (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

ION - Technical Support Analyst, Toronto - 4363

ION

Toronto, Ontario, Canada (On-Site)
4 Months ago
Life church - Associate Host Team Pastor

Life church

United States (On-Site)
4 Months ago
Shipt External - Engineer - Infrastructure Platforms

Shipt External

Birmingham, Alabama, United States (Hybrid)
4 Months ago
WEKA - Senior Software Engineer, Filesystem

WEKA

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Dun & Bradstreet - 2025 Summer Internship Program - Technology

Dun & Bradstreet

Jacksonville, Florida, United States (On-Site)
4 Months ago
Whatnot - Risk Strategist

Whatnot

(Remote)
4 Months ago
Info Stretch - Business Analyst 4

Info Stretch

Lansing, Michigan, United States (On-Site)
4 Months ago
Supercell - Senior Player Safety Manager

Supercell

Helsinki, Uusimaa, Finland (On-Site)
4 Months ago
Windranger Labs - Node.js Engineer

Windranger Labs

(Remote)
4 Months ago
Bloxd - Game Engine Developer

Bloxd

London, England, United Kingdom (On-Site)
7 Months ago

Get notifed when new similar jobs are uploaded

About The Company

San Jose, California, United States (Hybrid)

San Jose, California, United States (Hybrid)

San Jose, California, United States (Hybrid)

Folsom, California, United States (Hybrid)

Folsom, California, United States (Hybrid)

San Jose, California, United States (Hybrid)

San Jose, California, United States (On-Site)

San Jose, California, United States (On-Site)

San Jose, California, United States (Hybrid)

San Jose, California, United States (Hybrid)

View All Jobs

Get notified when new jobs are added by Samsung Semiconductor

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug