Job Description
Haptiq is a leader in AI-powered enterprise operations, delivering digital solutions and consulting services that drive value and transform businesses. We specialize in using advanced technology to streamline operations, improve efficiency, and unlock new revenue opportunities, particularly within the private capital markets.
Our integrated ecosystem includes PaaS - Platform as a Service, the Core Platform, an AI-native enterprise operations foundation built to optimize workflows, surface insights, and accelerate value creation across portfolios; SaaS - Software as a Service , a cloud platform delivering unmatched performance, intelligence, and execution at scale; S&C - Solutions and Consulting Suite , modular technology playbooks designed to manage, grow, and optimize company performance. With over a decade of experience supporting high-growth companies and private equity-backed platforms, Haptiq brings deep domain expertise and a proven ability to turn technology into a strategic advantage.
About the Role
Weโre seeking a skilled MLOps / AIOps Engineer
to lead the deployment, operation, and monitoring of AI services in production. Youโll operate at the intersection of infrastructure engineering and AI systems, ensuring our AI-powered APIs, RAG pipelines, MCPs, and agentic services
run reliably, securely, and at scale. Youโll collaborate closely with ML Engineers, Python Developers, and AI Architects
to design resilient infrastructure and operational workflows for distributed AI applications.
Key Responsibilities
Design, provision, and maintain infrastructure-as-code
for AI service deployment (using tools like Terraform, Pulumi, AWS CDK ).
Build and manage CI/CD pipelines
for deploying AI APIs, RAG pipelines, MCP services, and LLM agent workflows.
Implement and maintain operational and LLM observability
through monitoring and alerting systems.
Track AI-specific operational metrics, including inference latency, error rates, drift detection, and hallucination monitoring .
Optimize inference workloads and manage distributed AI serving frameworks ( Ray Serve, BentoML, vLLM, Hugging Face TGI , etc.).
Collaborate with ML Engineers and Python Developers
to define scalable, secure, and automated deployment processes.
Enforce operational standards for AI system security, data governance, and compliance .
Stay current with evolving AIOps and LLM observability frameworks , integrating emerging tools and best practices into our stack.
Required Skills & Experience
Proficiency with cloud infrastructure (AWS, Azure, or GCP)
and container orchestration platforms ( Docker, Kubernetes, ECS/EKS ).
Hands-on experience deploying and managing AI/ML services in production .
Strong understanding of CI/CD pipelines for AI services, LLM workflows, and model deployments .
Experience working with distributed AI serving frameworks and inference optimization strategies .
Solid grasp of observability practices, operational monitoring, incident response, and AI-specific performance tracking .
Familiarity with defining and maintaining AI system health metrics, dashboards, and alerts .
Awareness of AI security considerations, data protection policies, and operational governance requirements .
Curiosity and openness to adopting emerging AIOps, LLM observability, and AI infrastructure tools .
Why Join Us?
We value creative problem solvers who learn fast, work well in an open and diverse environment, and enjoy pushing the bar for success ever higher. We do work hard, but we also choose to have fun while doing it.
Job ID 9472929001 | Posted on June 27, 2025
Canโt find the right role? Email your resume tohaptiq@jobflowsmail.com to be considered for new positions in the future.
Haptiq does not discriminate on the basis of race, sex, color, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits.
#J-18808-Ljbffr