Job Description
Job description:
We need a hands-on senior Datascience & AI engineer who can build
deep analytics pipelines
in Python and implement a
GenAI Q&A layer
over enterprise data. The work is highly technical: data wrangling, metric computation, anomaly detection/forecasting (light ML), retrieval-augmented generation (RAG), and local LLM inference using
Llama + Ollama .
Responsibilities (technical)
Build robust analytics code in
Python
using
pandas/numpy
to compute, validate, and reconcile KPIs (costing, margins, QBR metrics, operational metrics).
Write efficient transformations (vectorization, memory optimization), and implement repeatable pipelines with tests and data validation.
Develop SQL to extract/shape datasets from enterprise sources and/or a cloud data warehouse; optimize queries as needed.
Implement a governed
GenAI “ask the data”
prototype:
Use
Llama-family models
via
Ollama
(or llama.cpp/vLLM as needed)
Build
RAG
over structured + semi-structured data (chunking, embeddings, retrieval, reranking)
Produce structured outputs (tables/JSON) and drill-down-ready answers
Add basic guardrails: grounded responses, citations/traceback to data, and safe handling of sensitive fields.
Apply light-to-moderate ML where useful:
anomaly detection (cost variances, outliers, feed failures)
simple forecasting / trend analysis for key metrics
model evaluation and error analysis
Create reproducible experimentation and evaluation:
test question sets for the LLM
accuracy/groundedness checks
latency profiling and performance tuning
Package deliverables for deployment (Docker, config management), and produce technical documentation/runbooks.
Required skills & experience
7+ years
hands-on in data science / analytics engineering / ML engineering (individual contributor).
Expert in
Python , especially:
pandas ,
numpy
data cleaning, joins/merges, windowed calculations, time-series handling
performance optimization (vectorization, profiling, memory management)
Strong
SQL
(complex joins, aggregates, window functions; tuning mindset).
Solid fundamentals in statistics and ML:
feature engineering basics, evaluation metrics, overfitting awareness
scikit-learn (or equivalent) for quick modeling
GenAI implementation experience:
Llama
models (or comparable open LLMs)
Ollama
for local inference (or similar)
RAG frameworks (LangChain/LlamaIndex) or custom retrieval pipelines
embeddings + vector stores (FAISS/pgvector/Weaviate/Pinecone)
Good engineering habits:
unit tests, data tests, logging, error handling
Git, CI basics
Docker and environment management
Nice-to-have
Snowflake experience (or similar modern cloud data platform).
dbt experience (modeling, tests, docs).
Experience with enterprise “semantic layers” or metric definitions at scale.
Experience building lightweight APIs (FastAPI) for analytics/LLM endpoints.
Familiarity with security constraints (RBAC concepts, masking, audit logs).
Tools/stack (typical)
Python, pandas, numpy, SQL, scikit-learn, Jupyter, Git, Docker, FastAPI (optional), LangChain/LlamaIndex (optional), Ollama, Llama models, vector DB (FAISS/pgvector/Weaviate), cloud data warehouse (Snowflake or equivalent).