Scientist 4, Data Science, 6+ Years , LLM , (Python/ TensorFlow / Pytorch), (Langchain / Langgraph), Datastructures
Western Digital
Job Summary
Western Digital is seeking a Principal ML Engineer to lead large-scale machine learning programs. This role involves end-to-end design, architecture, and implementation of ML initiatives, defining success metrics, and driving a metrics-driven culture. The Principal ML Engineer will mentor junior data scientists, foster innovation, and manage cross-functional stakeholders. The position requires deep technical expertise in ML/AI, strategic thinking, and leadership to translate ambiguous business challenges into concrete, metric-driven ML solutions that deliver measurable business value.
Must Have
- Lead end-to-end design, architecture, and implementation of large-scale machine learning programs.
- Own the technical vision and roadmap for ML initiatives.
- Drive solutioning for complex, ambiguous problems.
- Establish best practices and architectural standards for ML systems.
- Define success metrics and KPIs for ML initiatives.
- Mentor and guide junior and mid-level data scientists and ML engineers.
- Build and maintain strong relationships with cross-functional partners.
- PhD or Master's degree in Computer Science, Machine Learning, Statistics, Mathematics, or related quantitative field.
- 8+ years of hands-on experience in machine learning, data science, or related fields.
- 4+ years of experience leading technical projects or programs.
- Proven track record of deploying ML models/LLM Agents to production at scale.
- Expert-level proficiency in machine learning frameworks (TensorFlow, PyTorch).
- Deep understanding of ML fundamentals (supervised/unsupervised learning, deep learning, reinforcement learning, causal inference, optimization, statistical modeling).
- Strong software engineering skills with proficiency in Python.
- Experience with ML infrastructure and MLOps.
- Proficiency with big data technologies (Spark, Hadoop).
- Experience with cloud platforms (AWS, GCP, Azure) and containerization (Docker, Kubernetes).
- Strong understanding of algorithms, data structures, and system design principles.
- Experience in specialized LLM applications (conversational AI, code assistants, information extraction, content generation, autonomous decision-making systems).
- Experience building complex multi-agent systems.
- Hands-on experience with instruction tuning, preference learning (RLHF/DPO), or continued pretraining of LLMs.
- Demonstrated ability to lead and influence without direct authority.
- Exceptional communication skills.
- Proven stakeholder management experience.
- Strong analytical and problem-solving skills.
Good to Have
- Experience in one or more specialized domains: LLMs, NLP, computer vision, recommendation systems, time series forecasting, ranking, or LLMs/generative AI.
- Publications in top-tier conferences (NeurIPS, ICML, ICLR, KDD, CVPR, ACL, etc.) or journals.
- Experience building and scaling ML/LLM platforms or infrastructure.
- Background in experimentation design and causal inference methodologies.
- Contributions to open-source ML projects or communities.
- Experience working in high-growth technology companies or FAANG environments.
- Track record of patent filings or granted patents in ML/AI.
- Familiarity with ML model governance, fairness, and responsible AI practices specifically for generative AI.
Job Description
Job Description
Key Responsibilities
Technical Leadership & Program Ownership
- Lead the end-to-end design, architecture, and implementation of large-scale machine learning programs involving multiple interconnected projects
- Own the technical vision and roadmap for ML initiatives across the organization, ensuring alignment with business objectives
- Drive solutioning efforts for complex, ambiguous problems by breaking them down into actionable technical components
- Establish best practices, design patterns, and architectural standards for ML systems at scale
- Make critical technical decisions on model selection, infrastructure, tooling, and deployment strategies
- Champion production excellence by ensuring ML systems are reliable, scalable, maintainable, and cost-efficient
Goals & Metrics Ownership
- Define success metrics and KPIs for ML initiatives, establishing clear linkage between technical work and business outcomes
- Drive a metrics-driven culture by implementing comprehensive monitoring, experimentation frameworks, and impact measurement systems
- Analyze and communicate the business impact of ML solutions through rigorous A/B testing and causal inference methodologies
- Set and track ambitious yet achievable goals for your programs, proactively identifying and mitigating risks
- Translate business objectives into quantifiable ML objectives and success criteria
Mentorship & Team Development
- Mentor and guide junior and mid-level data scientists and ML engineers, accelerating their technical growth and career development
- Conduct code reviews, design reviews, and provide constructive feedback to elevate team quality standards
- Foster a culture of technical excellence, innovation, and continuous learning within the team
- Develop and deliver technical training sessions on advanced ML topics, tools, and methodologies
- Help shape hiring standards and participate actively in recruiting top ML talent
Stakeholder Management & Communication
- Build and maintain strong relationships with cross-functional partners including product managers, engineers, executives, and business stakeholders
- Communicate complex technical concepts and results to non-technical audiences through compelling data storytelling
- Present strategic recommendations and technical proposals to senior leadership and executive teams
- Navigate organizational complexity to drive alignment and consensus across multiple stakeholders
- Proactively manage expectations and communicate risks, tradeoffs, and dependencies clearly
Innovation & Research
- Stay at the forefront of ML/AI research and identify opportunities to apply cutting-edge techniques to business problems
- Publish findings through internal tech talks, external conferences, or academic papers (optional)
- Drive innovation through rapid prototyping, experimentation, and willingness to challenge conventional approaches
- Balance innovation with pragmatism, knowing when to leverage proven solutions versus exploring novel approaches
Qualifications
Education & Experience
- PhD or Master's degree in Computer Science, Machine Learning, Statistics, Mathematics, or related quantitative field (or equivalent practical experience)
- 8+ years of hands-on experience in machine learning, data science, or related fields
- 4+ years of experience leading technical projects or programs with demonstrated business impact
- Proven track record of deploying ML models/ LLM Agents to production at scale
Technical Expertise
- Expert-level proficiency in machine learning frameworks (TensorFlow, PyTorch)
- Deep understanding of ML fundamentals: supervised/unsupervised learning, deep learning, reinforcement learning, causal inference, optimization, and statistical modeling
- Strong software engineering skills with proficiency in Python and experience with production-grade code development
- Experience with knowledge graph integration, structured data extraction, or enterprise search systems
- Extensive experience with ML infrastructure and MLOps: model serving, monitoring, experimentation platforms, feature stores, and model registry
- Proficiency with big data technologies (Spark, Hadoop, distributed computing frameworks)
- Experience with cloud platforms (AWS, GCP, Azure) and containerization (Docker, Kubernetes)
- Strong understanding of algorithms, data structures, and system design principles
LLM & Agent Specialization:
- Experience in specialized applications: conversational AI, code assistants, information extraction, content generation, or autonomous decision-making systems
- Experience building complex multi-agent systems with inter-agent communication and coordination
- Hands-on experience with instruction tuning, preference learning (RLHF/DPO), or continued pretraining of LLMs
- Experience with LLM observability and monitoring tools (LangSmith, Weights & Biases, Phoenix, or similar)
- Knowledge of emerging agent architectures and research (Tree of Thoughts, ReWOO, Reflexion, etc.)
- Experience with code generation models and AI-assisted development tools
- Familiarity with multimodal LLMs and vision-language models
Leadership & Soft Skills
- Demonstrated ability to lead and influence without direct authority across organizational boundaries
- Exceptional communication skills with ability to distill complex technical concepts for diverse audiences
- Proven stakeholder management experience with senior leadership and cross-functional teams
- Strong analytical and problem-solving skills with attention to detail and business acumen
- Self-starter with ability to operate autonomously in ambiguous environments
- Track record of mentoring and developing technical talent
Preferred Qualifications
- Experience in one or more specialized domains: LLMs, NLP, computer vision, recommendation systems, time series forecasting, ranking, or LLMs/generative AI
- Publications in top-tier conferences (NeurIPS, ICML, ICLR, KDD, CVPR, ACL, etc.) or journals
- Experience building and scaling ML/LLM platforms or infrastructure
- Background in experimentation design and causal inference methodologies
- Contributions to open-source ML projects or communities
- Experience working in high-growth technology companies or FAANG environments
- Track record of patent filings or granted patents in ML/AI
- Familiarity with ML model governance, fairness, and responsible AI practices specifically for generative AI
Additional Information
We are seeking an exceptional Principal ML Engineer to join our team as a technical leader who will drive innovation and excellence in machine learning at scale. In this role, you will lead the solutioning and execution of large, complex programs spanning multiple projects while establishing technical direction and mentoring the next generation of ML talent.
This is a high-impact role requiring a unique blend of deep technical expertise, strategic thinking, stakeholder management, and leadership capabilities. You will be responsible for translating ambiguous business challenges into concrete, metric-driven ML solutions that deliver measurable business value.
Western Digital thrives on the power and potential of diversity. As a global company, we believe the most effective way to embrace the diversity of our customers and communities is to mirror it from within. We believe the fusion of various perspectives results in the best outcomes for our employees, our company, our customers, and the world around us. We are committed to an inclusive environment where every individual can thrive through a sense of belonging, respect and contribution.
Western Digital is committed to offering opportunities to applicants with disabilities and ensuring all candidates can successfully navigate our careers website and our hiring process. Please contact us at jobs.accommodations@wdc.com to advise us of your accommodation request. In your email, please include a description of the specific accommodation you are requesting as well as the job title and requisition number of the position for which you are applying.
Notice To Candidates: Please be aware that Western Digital and its subsidiaries will never request payment as a condition for applying for a position or receiving an offer of employment. Should you encounter any such requests, please report it immediately to Western Digital Ethics Helpline or email compliance@wdc.com.