Job Description

About Valiance Valiance is a deeptech AI company building sovereign and mission-critical AI solutions for enterprises, public sector, and government institutions. From predictive maintenance and demand planning to sovereign AI for citizen services, we design systems that thrive in high-stakes environments. Recognized with the NASSCOM AI Game Changers Award and the Aegis Graham Bell Award, and a certified Google Cloud Partner, our 200+ engineers and data scientists are shaping the future of industries and societies through responsible AI.

The Role We are looking for a senior LLMOps Engineer who has taken LLM inference optimization from idea to production — not just proof of concept. You will own the end-to-end efficiency of our LLM inference infrastructure running on H200 GPUs, driving down cost and latency while maintaining the reliability our enterprise and government clients demand. This is a high-ownership, high-impact role on a team building some of India's most consequential AI systems.

What You Will Do Design and operate production-grade LLM inference pipelines on H200 GPU clusters, optimizing for throughput, latency, and cost per token. Evaluate and deploy small-to-medium open-source LLMs (e.g., Mistral, Llama, Phi, Gemma) as cost-efficient alternatives to large models without sacrificing output quality. Tune and manage vLLM deployments — including continuous batching, paged attention, tensor parallelism, and quantization (GPTQ, AWQ, FP8) — in production environments. Build and maintain model-serving APIs with robust observability: latency percentiles, GPU utilization, queue depths, and cost-per-request dashboards. Architect Kubernetes-based autoscaling strategies for inference workloads, balancing cold-start penalties against cost at scale. Run structured A/B experiments comparing model variants, quantization levels, and batching strategies using production traffic — not synthetic benchmarks. Collaborate with applied ML engineers and solution architects to identify latency and cost bottlenecks across the model serving stack. Establish and enforce SLOs for inference reliability, and build alerting and runbooks for production incidents.

What We Are Looking For Non-Negotiables 3+ years of hands-on experience operating LLM inference in production — demonstrable cost and latency improvements, not POC results. Deep expertise with vLLM in production: batching strategies, memory management, quantization tradeoffs. Strong Python engineering skills — clean, testable, production-ready code. Proficiency with Docker and Kubernetes for deploying and scaling GPU inference workloads. Experience building and maintaining REST/gRPC APIs for model serving at scale. Hands-on experience with open-source LLMs and the ability to evaluate model-quality vs. cost tradeoffs for real use cases.

Strong Advantages Experience with GPU memory profiling and optimization (CUDA-level awareness a plus). Familiarity with model distillation, speculative decoding, or flash attention implementations. Exposure to multi-GPU and multi-node inference setups. Experience with inference frameworks beyond vLLM: TGI, TensorRT-LLM, Triton Inference Server. Familiarity with sovereign AI or air-gapped deployment constraints.

Why Valiance You will work on AI systems that are actually deployed at scale — used by government institutions and large enterprises, not just demoed. Direct access to H200 infrastructure with meaningful compute budgets — no GPU rationing. A culture that rewards engineering depth and production ownership over slide decks. Competitive compensation with performance-linked incentives. Opportunity to define how Valiance builds its AI platform as we scale. How to Apply Upload your resume and a brief note on a specific inference optimization you shipped in production — the problem, your approach, and the measurable outcome. We do not conduct screening rounds for this role. Shortlisted candidates will move directly to a technical discussion with our engineering leadership.

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now

Job Details

Posted Date: February 26, 2026

Job Type: Construction

Location: India

Company: Valiance Solutions

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now

Machine Learning Engineer

Job Description

Ready to Apply?

Job Details

Ready to Apply?