Job Description
Job Description
We are seeking an experienced
Solutions Architect
with deep expertise in
AI/ML infrastructure, High Performance Computing (HPC), and container platforms
to join our dynamic team focused on delivering
Cloud AI and Enterprise AI Factory solutions . This role is instrumental in architecting, deploying, and optimizing private cloud environments that support enterprise‑grade AI workloads at scale, leveraging validated reference architectures and industry‑standard frameworks.
The ideal candidate will bring strong technical expertise in AI infrastructure, container orchestration platforms, and hybrid cloud environments, and will play a key role in delivering scalable, secure, and high‑performance AI platform solutions.
Key Responsibilities
1. Leadership and Strategy
Provide delivery assurance and serve as the lead design authority for enterprise‑grade container platforms such as
Red Hat OpenShift
and
SUSE Rancher , aligned with customer AI/ML strategies and business objectives.
Align solution architecture with modern Enterprise AI Factory design principles including modular scalability, GPU optimization, and hybrid cloud orchestration.
Oversee planning, risk management, and stakeholder alignment throughout the project lifecycle.
2. Solution Planning and Design
Architect and optimize end‑to‑end solutions across
container orchestration
and
HPC workload management , leveraging platforms such as Red Hat OpenShift, SUSE Rancher, and workload schedulers like
Slurm
and
Altair PBS Pro .
Ensure seamless integration of container and AI platforms with the broader software ecosystem, including open‑source DevOps and AI/ML tools and frameworks.
3. Opportunity Assessment
Lead technical responses to RFPs, RFIs, and customer‑driven inquiries.
Conduct Proof‑of‑Concept (PoC) engagements to validate performance, feasibility, and integration.
Assess customer environments and recommend optimal configurations based on validated industry reference architectures and open‑source integrations.
4. Innovation and Research
Stay current with emerging technologies, industry trends, and best practices across HPC, Kubernetes, container platforms, hybrid cloud, and security domains.
5. Customer‑Centric Mindset
Serve as a trusted advisor to enterprise customers, aligning AI solutions with business objectives.
Translate complex technical concepts into clear value propositions for technical and non‑technical stakeholders.
6. Team Collaboration
Collaborate with cross‑functional teams, including experts in infrastructure components such as servers, storage, networking, and data science, to ensure cohesive delivery.
Mentor technical consultants and contribute to internal knowledge‑sharing sessions, tech talks, and innovation initiatives.
Required Skills
1. HPC & AI Infrastructure
Extensive knowledge of HPC technologies and workload schedulers such as
Slurm
and
Altair PBS Pro .
Experience with HPC cluster management tools (generic, without vendor references).
Strong understanding of high‑speed networking technologies such as
InfiniBand and Ethernet .
Experience with performance tuning of HPC components.
2. Containerization & Orchestration
Hands‑on experience with container technologies:
Docker, Podman, Singularity .
Proficient in at least two container orchestration platforms:
CNCF Kubernetes
Red Hat OpenShift
SUSE Rancher
RKE / K3S
Canonical Charmed Kubernetes
Strong understanding of GPU‑based workload environments, including GPU health and performance monitoring frameworks (genericized).
3. Operating Systems & Virtualization
Strong Linux system administration skills: package management, boot processes, troubleshooting, performance tuning, networking.
Hands‑on experience with at least two Linux distributions:
RHEL, SLES, Ubuntu .
Experience with virtualization technologies such as
KVM
and enterprise virtualization for hybrid cloud deployments.
4. Cloud, DevOps & MLOps
(Original text was incomplete—kept consistent based on context)
Solid understanding of hybrid cloud deployments, cloud architecture patterns, and cloud automation frameworks.
Experience with CI/CD, infrastructure‑as‑code, and automation tooling.
Skills
Mandatory Skills:
Azure Cloud Architecture
Cloud Solution Architecture
Kubernetes
Good to Have Skills:
Azure DevOps
Network Migration