Descripción del Puesto
About Us
Staq is a leading Banking-as-a-Service (BaaS) and embedded finance platform, transforming the way businesses integrate banking and financial services. At Staq, we empower our clients to innovate, expand, and streamline their financial services offerings, leveraging our cutting-edge platform. Our mission is to bridge the gap between traditional banking and the digital era, providing seamless, scalable, and secure financial solutions.
The Role
We are building the intelligence layer that will power an AI-powered financial assistant and serve as the SDK that other banking applications plug into. The long-term vision is an AI-native bank where every customer interaction, recommendation, and financial operation is orchestrated through this platform. That means the agent runtime, automation engine, recommendation systems, and tool execution framework all need to be built as reusable, production-grade infrastructure — not one-off features for a single product. The objective is to build, harden, and ship the intelligence platform across multiple products simultaneously. You will be building the systems that make AI actually work in finance: agents that reason about money, automations that run reliably on people’s financial data, recommendations that are genuinely useful, and tool execution that is safe and observable. This is systems engineering meets applied AI.
Key Responsibilities
Agent Runtime & Orchestration
Build and maintain production AI agent flows using Python and LangGraph, including multi-step planning, tool selection, and context assembly
Author and evolve Agent Cards that define agent capabilities, context requirements, and output contracts for each product domain
Implement the agent-side integration with Temporal workflows — the AGENT STEP and AGENT LOOP activity interfaces that the Java orchestrator calls into
Own prompt engineering, template management, and context window optimization across all agent flows
Design and implement memory systems that give agents meaningful continuity — conversation history, user financial context, and long-term preference tracking across sessions
Automation & Intelligent Workflows
Design and implement automation flows that go beyond conversational agents — scheduled financial health checks, proactive alerting, background data analysis, and event-driven triggers
Build reliable, deterministic automation pipelines that can execute multi-step financial operations with proper error handling, compensation logic, and human-in-the-loop escalation
Ensure automations are idempotent, observable, and operate within the platform’s risk gate framework
Recommendation Systems
Build and iterate on recommendation engines that surface personalized financial insights, product suggestions, and actionable next-best-actions to users
Design the data contracts and feature pipelines that feed recommendations, working with domain services for banking, credit, and subscription data
Implement evaluation frameworks to measure recommendation quality, relevance, and user engagement
Sandboxed Tool Execution
Own the integration with sandboxed execution environments (E2B) where agents run tools against real financial APIs and data sources
Implement and maintain MCP (Model Context Protocol) tool definitions, ensuring agents can safely invoke financial operations within policy-controlled boundaries
Build guardrails around tool execution — input validation, output verification, and safe fallback behavior when tools fail or return unexpected results
Reliability & Testing
Build comprehensive test harnesses for agent behavior — deterministic scenario tests, regression suites, and evaluation benchmarks
Own the reliability engineering of the agent runtime: graceful degradation when LLMs misbehave, proper retry logic, timeout handling, and circuit breakers
Support adversarial testing and red-teaming efforts from the AI side
Platform & SDK Mindset
Everything you build must be reusable. Zeen is the first product, but the intelligence layer is an SDK — other banking applications will build on top of the same agent patterns, tool integrations, and automation frameworks
Maintain and evolve the shared contracts (Agent Cards, tool schemas, risk gate interfaces) that allow new products to onboard onto the platform with minimal custom work
Think in terms of clean abstractions and extension points, not hard-coded product logic
Technical Environment
Python (primary), with integration touchpoints to Java microservices
LangGraph for agent orchestration; Temporal Cloud (Java SDK) as the durable workflow engine
OPA/Rego for policy enforcement across four risk gate stages (pre-LLM, post-LLM, pre-tool, post-tool)
E2B sandboxed containers for tool execution; MCP for tool protocol
OpenTelemetry for observability; structured artifact logging
LLM providers via a gateway abstraction (model-agnostic)
Fintech domain: Plaid integrations, banking/credit/subscription data
What We Are Looking For
Must Have
3+ years building production AI/ML systems (not just notebooks — deployed, monitored, maintained)
Strong Python fundamentals and experience with async patterns, error handling, and production-grade code
Hands-on experience with LLM application development — prompt engineering, context engineering, tool/function calling, and structured outputs
Experience building at least one of: recommendation systems, automation pipelines, or multi-step agent workflows
Understanding of evaluation and testing for non-deterministic systems — you know that “it works on my prompt” is not a test strategy
Comfort working with financial data where correctness and reliability matter more than speed of iteration
Strong Signals
Experience with agent frameworks (LangGraph, LangChain, AutoGen, CrewAI) in production, not just prototypes
Familiarity with memory systems for AI agents — short-term and long-term memory architectures, retrieval-augmented generation, and context window management strategies
Experience with prompt management at scale — versioning, templating, A/B testing, and systematic prompt optimization workflows
Familiarity with sandboxed code execution, MCP, or tool-use patterns for LLM agents
Background in fintech, financial data, or regulated industries
Experience with recommendation engines (collaborative filtering, content-based, hybrid approaches)
Familiarity with workflow orchestration systems (Temporal, Airflow, Prefect) and how AI fits into durable execution patterns
Experience with LLM observability and performance tracking — call latency profiling, token usage monitoring, cost attribution, and tracing through multi-step agent flows
What This Role Is Not
This is not a pure ML research position. We are not training foundation models. You will be building application-layer AI systems on top of LLMs and integrating them into a financial services platform that real people depend on for real money. The challenge is in the systems engineering, reliability, and product thinking — not in publishing papers.