Senior Staff Engineer- AI/ML

7 Minutes ago • 4-12 Years
Research Development

Job Description

Marvell is a global leader in high-performance silicon and AI infrastructure. This team develops Murals, a next-generation AI/ML infrastructure simulation and design platform for optimizing large-scale training and inference workloads. The role involves implementing simulation workflows, performance analysis, networking studies, tool development, prototyping, hardware/software co-design, and scaling studies for advanced AI infrastructure like LLMs and GenAI, collaborating with architects and designers.
Good To Have:
  • Familiarity with interconnect and memory technologies (CXL, PCIe, NVLink, UAL).
  • Experience with profiling, telemetry, observability, and debugging tools.
  • Knowledge of collective communication algorithms and topology-aware scheduling.
  • Exposure to AI accelerators, memory disaggregation, DPUs, and custom silicon.
  • Familiarity with visualization tools (Perfetto, Chrome Tracing, Chakra Timeline, Flamegraphs).
  • Experience with large-scale AI training pipelines and scaling studies.
  • Interest in energy/performance trade-offs and resilience techniques.
Must Have:
  • Implement workflows to study AI/ML workloads using trace-driven and analytical models.
  • Profile and analyze system bottlenecks across compute, memory, and network layers.
  • Evaluate collective communication performance across different topologies and fabrics.
  • Develop utilities for trace generation, merging, conversion, and visualization.
  • Test distributed training and inference pipelines in simulated and real environments.
  • Collaborate on emerging technologies (CXL, DPUs, NVLink, PCIe, UET/UEC, in-network compute).
  • Conduct performance projections and trade-off studies for next-gen AI infrastructure.
  • Document workflows, publish internal reports, and drive peer learning.
  • Strong foundation in computer architecture, distributed systems, AI/ML, and operating systems.
  • Solid networking fundamentals including TCP/IP, RDMA, RoCE, UET/UEC, and switching/routing.
  • Experience with simulation frameworks (e.g., Astra-Sim, Chakra, gem5, SST, NS-3).
  • Hands-on with PyTorch/TensorFlow and distributed training frameworks (DDP, Horovod, DeepSpeed).
  • Strong programming skills in Python, C++, and scripting for automation.
Perks:
  • Competitive compensation
  • Great benefits
  • Workstyle within an environment of shared collaboration, transparency, and inclusivity
  • Tools and resources to succeed in doing work that matters
  • Opportunities to grow and develop

Add these skills to join the top 1% applicants for this job

communication
problem-solving
performance-analysis
cpp
game-texts
automated-testing
networking
pytorch
python
algorithms
tensorflow

About Marvell

Marvell’s semiconductor solutions are the essential building blocks of the data infrastructure that connects our world. Across enterprise, cloud and AI, automotive, and carrier architectures, our innovative technology is enabling new possibilities. At Marvell, you can affect the arc of individual lives, lift the trajectory of entire industries, and fuel the transformative potential of tomorrow. For those looking to make their mark on purposeful and enduring innovation, above and beyond fleeting trends, Marvell is a place to thrive, learn, and lead.

Your Team, Your Impact

Marvell is a global leader in high-performance silicon and AI infrastructure, powering the world’s most advanced datacenters, cloud platforms, 5G networks, and AI/ML workloads. Our innovations drive next-generation computing, networking, and storage, enabling breakthrough performance, scalability, and efficiency for the most demanding applications.

This team at Marvell develops Murals, a next-generation AI/ML infrastructure simulation and design platform that enables in-depth analysis and optimization of large-scale training and inference workloads. Leveraging trace-driven simulation, performance modeling, and hardware/software co-design, the team helps shape scalable and resilient solutions for advanced workloads such as LLMs, DLRMs, GenAI, and GNNs.

Working closely with system architects, hardware designers, and ML practitioners, the team explores innovative ways to optimize compute, memory, and networking subsystems across complex datacenter environments.

What You Can Expect

  • Simulation & Modeling – Implement workflows to study AI/ML workloads using trace-driven and analytical models.
  • Performance Analysis – Profile and analyze system bottlenecks across compute, memory, and network layers.
  • Networking Studies – Evaluate collective communication performance (all-reduce, all-to-all, reduce-scatter) across different topologies and fabrics.
  • Tooling & Automation – Develop utilities for trace generation, merging, conversion, and visualization.
  • Prototype & Validation – Test distributed training and inference pipelines in simulated and real environments.
  • Hardware/Software Co-Design – Collaborate on emerging technologies (CXL, DPUs, NVLink, PCIe, UET/UEC, in-network compute).
  • Scaling Studies – Conduct performance projections and trade-off studies for next-gen AI infrastructure.
  • Knowledge Sharing – Document workflows, publish internal reports, and drive peer learning.

What We're Looking For

  • Bachelor’s, Master’s, or PhD in Computer Science, Electrical Engineering, or related field with 4–12 years of relevant professional experience.
  • Strong foundation in computer architecture, distributed systems, AI/ML, and operating systems.
  • Solid networking fundamentals including TCP/IP, RDMA, RoCE, UET/UEC, and switching/routing.
  • Experience with simulation frameworks (e.g., Astra-Sim, Chakra, gem5, SST, NS-3).
  • Hands-on with PyTorch/TensorFlow and distributed training frameworks (DDP, Horovod, DeepSpeed).
  • Strong programming skills in Python, C++, and scripting for automation.
  • Familiarity with interconnect and memory technologies (CXL, PCIe, NVLink, UAL).
  • Experience with profiling, telemetry, observability, and debugging tools.
  • Knowledge of collective communication algorithms and topology-aware scheduling.
  • Exposure to AI accelerators, memory disaggregation, DPUs, and custom silicon.
  • Familiarity with visualization tools (Perfetto, Chrome Tracing, Chakra Timeline, Flamegraphs).
  • Experience with large-scale AI training pipelines and scaling studies.
  • Interest in energy/performance trade-offs and resilience techniques.

Additional Compensation and Benefit Elements

With competitive compensation and great benefits, you will enjoy our workstyle within an environment of shared collaboration, transparency, and inclusivity. We’re dedicated to giving our people the tools and resources they need to succeed in doing work that matters, and to grow and develop with us. For additional information on what it’s like to work at Marvell, visit our Careers page.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status.

Interview Integrity

As part of our commitment to fair and authentic hiring practices, we ask that candidates do not use AI tools (e.g., transcription apps, real-time answer generators like ChatGPT, CoPilot, or note-taking bots) during interviews. Our interviews are designed to assess your personal experience, thought process, and communication skills in real-time. If a candidate uses such tools during an interview, they will be disqualified from the hiring process.

This position may require access to technology and/or software subject to U.S. export control laws and regulations, including the Export Administration Regulations (EAR). As such, applicants must be eligible to access export-controlled information as defined under applicable law. Marvell may be required to obtain export licensing approval from the U.S. Department of Commerce and/or the U.S. Department of State. Except for U.S. citizens, lawful permanent residents, or protected individuals as defined by 8 U.S.C. 1324b(a)(3), all applicants may be subject to an export license review process prior to employment.

#LI-MN1

Set alerts for more jobs like Senior Staff Engineer- AI/ML
Set alerts for new jobs by Marvell
Set alerts for new Research Development jobs in India
Set alerts for new jobs in India
Set alerts for Research Development (Remote) jobs

Contact Us
hello@outscal.com
Made in INDIA 💛💙