Job Description

Location:

Bengaluru, Karnataka

About the Company:

Sustainability Economics.ai is a global organization, pioneering the convergence of clean energy and AI, enabling profitable energy transitions while powering end-to-end AI infrastructure. By integrating AI-driven cloud solutions with sustainable energy, we create scalable, intelligent ecosystems that drive efficiency, innovation, and long-term impact across industries. Guided by exceptional leaders and visionaries with decades of expertise in finance, policy, technology, and innovation, we are committed to making long-term efforts to fulfil this vision through our technical innovation, client services, expertise, and capability expansion.

Role Summary: We are seeking a highly skilled and innovative

Inference Optimization (LLM and Runtime)

to design, develop, and optimize cutting-edge AI systems that power intelligent, scalable, and agent-driven workflows. This role blends the frontier of generative AI research with robust engineering, requiring expertise in machine learning, deep learning, and large language models (LLMs) and latest trends going on in the industry. The ideal candidate will collaborate with cross-functional teams to build production-ready AI solutions that address real-world business challenges while keeping our platforms at the forefront of AI innovation.

Key Tasks and Accountability: Optimization and customization

of large-scale generative models (LLMs) for efficient inference and serving. Apply and evaluate advanced

model optimization techniques

such as quantization, pruning, distillation, tensor parallelism, caching strategies, etc., to enhance model efficiency, throughput, and inference performance. Implement

custom fine-tuning pipelines

using parameter-efficient methods (LoRA, QLoRA, adapters etc.) to achieve task-specific goals while minimizing compute overhead. Optimize

runtime performance

of inference stacks using frameworks like vLLM, TensorRT-LLM, DeepSpeed-Inference, and Hugging Face Accelerate. Design and implement

scalable model-serving architectures

on GPU clusters and cloud infrastructure (AWS, GCP, or Azure). Work closely with platform and infrastructure teams to reduce

latency, memory footprint, and cost-per-token

during production inference. Evaluate

hardware–software co-optimization strategies

across GPUs (NVIDIA A100/H100), TPUs, or custom accelerators. Monitor and profile performance using tools such as

Nsight, PyTorch Profiler, and Triton Metrics

to drive continuous improvement.

Key Requirements: Education & Experience Ph.D. in

Computer Science

or a related field, with a specialization in

Deep Learning, Generative AI, or Artificial Intelligence and Machine Learning (AI/ML) . 2–3 years of hands-on experience in large language model (LLM) or deep learning optimization, gained through academic or industry work.

Skills Strong analytical and mathematical reasoning ability with a focus on measurable performance gains. Collaborative mindset, with ability to work across research, engineering, and product teams. Pragmatic problem-solver who values

efficiency, reproducibility, and maintainable code

over theoretical exploration. Curiosity-driven attitude — keeps up with

emerging model compression and inference technologies .

What You’ll Do Take ownership of

end-to-end optimization lifecycle

— from profiling bottlenecks to delivering production-optimized LLMs. Develop

custom inference pipelines

capable of high throughput and low latency under real-world traffic. Build and maintain

internal libraries, wrappers, and benchmarking suites

for continuous performance evaluation.

What you will bring Hands-on experience in building, optimizing machine learning or Agentic Systems

at scale. A builder’s mindset — bias toward action, comfort with experimentation, and enthusiasm for solving complex, open-ended challenges. Startup DNA

→ bias to action, comfort with ambiguity, love for fast iteration, and flexible and growth mindset.

Why Join Us Shape a

first-of-its-kind AI + clean energy platform

. Work with a small, mission-driven team obsessed with impact. An aggressive growth path. A chance to leave your mark at the intersection of

AI and sustainability

.

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now

Job Details

Posted Date: February 25, 2026

Job Type: Construction

Location: India

Company: Sustainability Economics.ai

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now

Inference Optimization Engineer(LLM and Runtime)

Job Description

Ready to Apply?

Job Details

Ready to Apply?