Deep Learning Kernel Software Performance Architect

NVIDIA
📍 Shanghai, China, China 💼 Full-time 🕒 Posted February 26, 2026

Job Description

NVIDIA is seeking Software Performance Architects to optimize GPU kernel performance for state-of-the-art data-center platforms. We build automated, data-driven workflows to detect, explain, and prevent performance regressions across key deep learning workloads, partnering closely with kernel developers, compiler teams, infrastructure, and architecture/performance groups.


What you'll be doing:
+ Performance analysis + debugging
+ Validate and analyze performance of GPU-accelerated kernels and key deep learning building blocks.
+ Debug performance issues end-to-end: reproduce, isolate root causes, propose fixes or mitigation paths, and drive closure with the owning teams.
+ Build performance narratives using structured evidence: baselines, controlled comparisons, and regression attribution.
+ Automation + regression infrastructure (Python-heavy)
+ Develop and maintain Python-based automation for performance testing and analysis—using modern AI-assisted ...

Ready to Apply?

Submit your application today and join our talented team at NVIDIA.

Submit Application

Job Details

  • Location Shanghai, China
  • Job Type Full-time
  • Category other-general
  • Posted Date February 26, 2026
  • Application Deadline March 04, 2026