Deep Learning Performance Architect, CUTLASS DSL

NVIDIA

📍 Shanghai, China, China 💼 Full-time 🕒 Posted June 22, 2026

Apply Now Similar Jobs

Job Description

                Are you passionate about programming languages, compiler technology, and GPU performance? Do you want to help shape the future of high-performance kernel development for AI? We are looking for outstanding engineers to build   CUTLASS   DSL, a Python-native language for GPU kernel development, along with the MLIR dialects and lowering passes behind it. In this role, you will   also   help accelerate kernel compilation while delivering performance comparable to CUTLASS C++, enabling efficient hardware-software co-design for NVIDIA's next generation of AI platforms.   
  
  
  
 What   you'll   be doing:
+ Design, develop, and   optimize   C UTLASS   DSL, a Python-native language for high-performance GPU kernel development
+ Build and advance the MLIR dialects, lowering passes, and code generation flows that power the   C UTLASS   DSL stack
+ Drive innovations that improve kernel compilation speed while   maintaining   performance on par with CUTLASS C+
+ 
+ Col...
            

Ready to Apply?

Submit your application today and join our talented team at NVIDIA.

Submit Application

Job Details

Location Shanghai, China
Job Type Full-time
Category other-general
Posted Date June 22, 2026
Application Deadline June 27, 2026