Job Description
AI/ML Solutions Architect – Distributed Training & GPU Infrastructure
Location: Remote from anywhere in Europe
Total compensation up to EU 300k (base + variable), depending on level and experience
Join a fast-moving AI infrastructure team working on the cutting edge of large-scale ML workloads. This role is ideal for engineers who enjoy solving deep technical challenges in distributed training, multi-GPU systems, and scalable AI inference infrastructure. You will work directly with AI-focused clients, helping them get the most out of modern GPUs (H100, B200, etc.) and ML frameworks such as
PyTorch (and JAX in some environments).
Team & Responsibilities
Work alongside senior AI and infrastructure engineers building large-scale GPU platforms. As part of the customer solutions team, you will:
Design and validate
production-grade distributed training (primary) and large-scale inference architectures
on large GPU clusters, typically
tens to thousands of GPUs
Work hands-on with customers to
debug, optimize, and scale
ML workloads across multi-node GPU environments
Act as a
technical authority on GPU performance, networking, and schedulers , making trade-offs at scale and translating customer needs into concrete platform requirements
Collaborate closely with
engineering, product, and R&D
to influence roadmap decisions based on real-world ML workloads
This is a
hands-on, technical role ; you are expected to work directly in customer environments, not only advise at a high level
Required Skills
Hands-on experience designing and operating
production-grade, multi-node GPU workloads
for training or inference
Strong background in
distributed deep learning
(PyTorch Distributed, DeepSpeed) on GPU clusters
Deep understanding of
GPU architecture and interconnects
(H100/A100 class, NVLink, InfiniBand)
Experience with
Kubernetes or Slurm
and performance tuning using GPU profiling and monitoring tools
This role is
not a fit
if your experience is limited to single-node training, high-level AI strategy, or non-production research environments. We are looking for engineers and architects who thrive at the intersection of AI workloads and large-scale infrastructure.
Interested? Apply!
Ready to Apply?
Don't miss this opportunity! Apply now and join our team.
Job Details
Posted Date:
February 26, 2026
Job Type:
Construction
Location:
Indonesia
Company:
Doghouse Recruitment
Ready to Apply?
Don't miss this opportunity! Apply now and join our team.