Job Description

Job Title:

AI PRE Engineer Location: Noida Experience: 10-16 Years

AI PRE Engineer (Platform Reliability / Production Readiness Engineer) The Role An AI PRE Engineer ensures AI/ML platforms are production-ready, highly reliable, observable, secure, and cost-efficient, bridging AI engineering, SRE, DevOps, and MLOps disciplines. Responsibilities:

Define and maintain production readiness standards across platform, data, model, application, and security layers. Establish SLO/SLI frameworks for latency, availability, quality, safety, and drift implement error budget policies. Publish reference architectures for LLM apps, RAG, vector stores, agent frameworks, and batch/stream inference. Curate deployment blueprints (canary/shadow, blue–green, A/B) for models and prompts with rollback guidance. Standardize observability patterns for prompts, embeddings, latency, cost, quality, and safety telemetry. Own capacity engineering (token/concurrency budgets, GPU/CPU sizing, vector scaling, cache hierarchies). Define resilience patterns (timeouts, circuit breakers, fallbacks, idempotent retries, semantic/prompt caching). Set AI security baselines (secrets, private networking, egress controls) and mandate red‑team & safety evaluations. Maintain compliance mappings (e.g., ISO 27001, SOC 2, GDPR/DPDP, HIPAA where applicable). Provide CI/CD pipelines, SDKs, Helm/Terraform templates, and policy‑as‑code for consistent delivery. Author PRR checklists, runbooks/playbooks, and DR/BCP blueprints (RTO/RPO, multi‑region/site failover). Drive enablement (trainings, brown-bags) and maintain knowledge repositories and decision records. Partner with solution teams to validate architecture and non‑functional requirements (scale, latency, cost, safety). Conduct Production Readiness Reviews (PRRs) and certify releases across performance, security, privacy, and compliance. Implement observability (tracing, metrics, logs), dashboards, and SLO burn and cost anomaly alerting. Experience with different IDE such as Jupiter Notebook, Visual Studio Code, PyCharm, etc. Familiar with AI related libraries like LangChain, PandasAI, OpenAI

Execute safe releases (canary/shadow/blue green), prompt/model versioning, feature flags, and rollback plans. Lead incident response for AI workloads; perform post‑incident reviews and drive systemic fixes. Govern token/cost budgets, autoscaling thresholds, and vector store performance for FinOps efficiency.

Qualifications & Experience

Bachelor’s degree in computer science, Engineering, or Information Technology Master’s degree in systems architecture, Cloud Computing, or AI‑related disciplines is preferred 9–14 years of overall IT or platform engineering experience 5–7 years designing or managing enterprise platforms (AI, data, or cloud platforms) 3–5 years in architecture or platform strategy roles supporting multiple teams or business units Production readiness reviews, SLO/SLI/SLA design, incident management, RCA/postmortems, on-call support, and capacity planning for AI/ML platforms Hands-on experience with AWS/GCP/Azure, GPU-aware infrastructure, Infrastructure as Code (Terraform), Docker, Kubernetes (EKS/GKE/AKS), and managing large-scale, multi-tenant clusters Deploying ML/LLM workloads to production, model lifecycle management, RAG pipelines, safe rollouts (canary/shadow), rollback strategies, and managing inference scalability and latency Metrics, logging, tracing, and alerting using Prometheus/Grafana/OpenTelemetry or cloud-native tools; monitoring AI-specific signals such as model drift, latency, token usage, and GPU utilization Strong coding (Python/Go/Java), CI/CD pipelines (GitHub Actions, Jenkins), GitOps, automated reliability tooling, security best practices (secrets management, access control, AI guardrails) Certifications Required: NVIDIA Certified Professional: AI Infrastructure & Operations NVIDIA DLI – Deploying AI with Kubernetes & GPUs NVIDIA DLI – Building AI Infrastructure with NVIDIA Technologies Certified Kubernetes Administrator Docker Certified Associate Red Hat Certified System Administrator (RHCSA) Linux Foundation Certified System Administrator (LFCS)

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now

Job Details

Posted Date: February 26, 2026

Job Type: Construction

Location: India

Company: HCLTech

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now