Home Job Listings Categories Locations

Senior SRE: AI/ML HPC Infra & GPU Cluster

📍 Toronto, Canada

Technology Boson AI

Job Description

A technology company in Toronto seeks a Senior Site Reliability Engineer to manage and optimize its HPC infrastructure. In this role, you'll ensure smooth operations of a powerful GPU cluster, deploy infrastructure-as-code solutions, and support ML teams. Candidates should have extensive SRE experience, proficiency in Linux, and familiarity with Kubernetes and Ceph storage. This position offers the chance to work with cutting-edge technology in a collaborative environment, perfect for problem-solvers who love learning. #J-18808-Ljbffr

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Job Details

Posted Date: February 24, 2026
Job Type: Technology
Location: Toronto, Canada
Company: Boson AI

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.