Job Description
Why this role matters
We’re building production-grade GenAI systems at the core of trading, risk, and enterprise decision-making. This is not a demo lab or a prompt-only role—you’ll design and ship real LLM platforms used by traders, compliance teams, and leadership every day.
If you enjoy solving hard problems at the intersection of LLMs, distributed systems, security, and real-world constraints, this role gives you scale, ownership, and impact.
What you’ll build
As a Senior LLM Engineer, you’ll own the end-to-end lifecycle of Large Language Model–powered systems—from architecture to production operations.
You’ll work on:
LLM-native platforms for trading intelligence, risk signals, compliance automation, fraud detection, and enterprise knowledge systems
RAG and agentic workflows that integrate market data, internal systems, and secure tools
High-performance, cost-aware inference stacks running in hybrid and multi-cloud environments
Evaluation, safety, and governance frameworks that make LLMs reliable in regulated financial environments
What you’ll do
Architecture & Engineering
Design and ship end-to-end LLM systems: prompt orchestration, RAG pipelines, agents/tool-use, and workflow automation
Build low-latency, scalable APIs and microservices for inference and orchestration
Implement LLMOps/MLOps pipelines: model versioning, prompt CI/CD, experiment tracking, and automated A/B testing
Data & Retrieval Systems
Engineer secure ingestion pipelines for market data, trades, compliance records, and voice/text transcripts
Design vector search systems with smart chunking, hybrid retrieval (BM25 + embeddings), re-ranking, and auditability
Evaluation, Safety & Model Risk
Define how “good” looks: hallucination detection, task-level metrics, adversarial testing, and red-teaming
Build human-in-the-loop validation, bias/fairness checks, and explainability where required
Help formalize model risk management for LLMs in production
Security & Compliance (Done Right)
Work with Security and Risk teams to embed Zero Trust, least-privilege access, PII controls, and audit trails
Implement guardrails: content filters, policy enforcement, safe tool invocation, and full traceability
Observability & Cost Engineering
Instrument everything: latency, throughput, token usage, prompt drift, errors, and SLOs
Actively optimize cost using model selection, quantization, caching, batching, and autoscaling
Technical Leadership
Partner with Trading, Risk, Compliance, Infra, and Platform teams
Mentor engineers on LLM patterns, prompt engineering, and production best practices
Contribute to internal platforms, reusable components, and engineering standards
What we’re looking for
Experience
5–10+ years in ML / platform / backend engineering
3+ years building and operating production LLM systems
Strong hands-on skills in:
Languages: Python (primary), plus TypeScript / Go / Java
LLM frameworks: LangChain, LlamaIndex, Semantic Kernel, DSPy (or similar)
RAG systems: FAISS, Milvus, Pinecone; hybrid search; cross-encoder re-ranking
Model ecosystems: OpenAI / Azure OpenAI, Anthropic, Vertex AI; open-source (Llama, Mistral, Phi)
Inference optimization: vLLM, Triton, quantization (GPTQ/AWQ), batching, caching
LLMOps/MLOps: MLflow or W&B, model registries, CI/CD, feature stores
Cloud & infra: Docker, Kubernetes, Terraform, event-driven systems
Observability: Prometheus, Grafana, OpenTelemetry (metrics, logs, traces)
Security fundamentals: OAuth/OIDC, RBAC, secrets management, encryption, data governance
Why you’ll enjoy working here
Real production impact—no toy demos
Hard engineering problems with clear ownership and scale
Freedom to choose the right models and architectures, not just the trendy ones
A chance to define how GenAI is done responsibly in financial services