The candidate will be responsible for understanding Deep Learning workload characteristics, measuring, analyzing, and projecting the power and performance of the latest DL workloads. This role requires a background in both software and hardware to perform sensitivity analysis for hardware knobs and improve the performance of DL workloads. Experience with simulators, benchmarking DL models, and programming in Python/C++/CUDA/HIP/OpenCL is necessary. The ideal candidate will have experience working on performance analysis of DL workloads running on accelerators and improving them, with a solid understanding of computer architecture fundamentals.