Home Job Listings Categories Locations

Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

๐Ÿ“ Toronto, Canada

Construction Amazon Web Services (AWS)

Job Description

Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs

Join to apply for the Sr. ML Kernel Performance Engineer, AWS Neuron, Annapurna Labs role at Amazon Web Services (AWS). The Annapurna Labs team at AWS builds AWS Neuron, the software development kit used to accelerate deep learning and GenAI workloads on Amazonโ€™s custom machine learning accelerators, Inferentia and Trainium. The Acceleration Kernel Library team focuses on maximizing performance for AWSโ€™s custom ML accelerators. This role involves crafting high-performance kernels for ML functions at the hardware-software boundary to ensure optimal performance for demanding workloads. You will work across frameworks, compilers, runtime, and collectives, contributing to future architecture designs and customer enablement. This is an opportunity to work at the intersection of machine learning, high-performance computing, and distributed architectures, shaping the future of AI acceleration technology. This is a chance to work on cutting-edge products, architect and implement business-critical features, publish research, and mentor engineers in a small, agile team that values experimentation and learning. The team collaborates closely with customers on model enablement, providing optimization expertise for ML workloads on AWS accelerators. Explore the product and our history: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-cc/index.html https://aws.amazon.com/machine-learning/neuron/ https://github.com/aws/aws-neuron-sdk https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success Key job responsibilities

Design and implement high-performance compute kernels for ML operations, leveraging the Neuron architecture and programming models Analyze and optimize kernel-level performance across multiple generations of Neuron hardware Conduct detailed performance analysis using profiling tools to identify and resolve bottlenecks Implement compiler optimizations such as fusion, sharding, tiling, and scheduling Work directly with customers to enable and optimize their ML models on AWS accelerators Collaborate across teams to develop innovative kernel optimization techniques A day in the life

As you design and code solutions to drive efficiencies in software architecture, youโ€™ll create metrics, implement automation and other improvements, and resolve root causes of software defects. Youโ€™ll build high-impact solutions for a large customer base, participate in design discussions and code reviews, and work cross-functionally to drive business decisions with your technical input. Youโ€™ll thrive in a startup-like development environment focused on the most important work. About The Team

Diversity of experiences is valued; candidates not meeting every qualification are encouraged to apply. Why AWS: AWS is a leading cloud platform trusted by startups to Global 500 companies. Inclusive team culture with employee affinity groups and leadership principles guiding collaboration. Work/Life balance with flexible hours. Mentorship and career growth opportunities. Basic Qualifications

5+ years of non-internship professional software development experience 5+ years of programming with at least one programming language 5+ years of leading design or architecture of systems Experience as a mentor, tech lead, or leading an engineering team Preferred Qualifications

5+ years of full software development lifecycle experience Bachelorโ€™s degree in computer science or equivalent Expertise in accelerator architectures for ML or HPC (GPUs, CPUs, FPGAs, or custom) Experience with GPU kernels and backends (CUDA, OpenCL, SYCL, ROCm, etc.) Experience with NVIDIA PTX and/or AMD GPU ISA Experience developing high performance libraries for HPC Proficiency in low-level GPU performance optimization Experience with LLVM/MLIR backend development for GPUs Knowledge of ML frameworks (PyTorch, TensorFlow) and their GPU backends Experience with parallel programming and optimization techniques Understanding of GPU memory hierarchies and optimization strategies Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status. If you require a workplace accommodation during the application or hiring process, please visit amazon.jobs/accommodations for more information. Company - Amazon Development Centre Canada ULC Job ID: A3059954 Seniority level

Mid-Senior level Employment type

Full-time Job function

Information Technology, Consulting, and Engineering Industries

IT Services and IT Consulting Referrals increase your chances of interviewing at Amazon Web Services (AWS) by 2x. Get notified about new Senior Performance Engineer jobs in Toronto, Ontario, Canada.

#J-18808-Ljbffr

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Job Details

Posted Date: December 13, 2025
Job Type: Construction
Location: Toronto, Canada
Company: Amazon Web Services (AWS)

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.