Job Description

What You'll Build Core Responsibilities Data Architecture & Infrastructure (40%) ● Design and implement a multi-database architecture (MongoDB, Redis, Milvus, Neo4j, BigQuery) ● Build scalable data pipelines for real-time conversation processing and personalization ● Architect ETL/ELT workflows for data migration from legacy systems ● Implement data partitioning, sharding, and optimization strategies for high-throughput systems ● Create data governance frameworks ensuring quality, security, and compliance

Vector & Graph Database Systems (25%) ● Design and optimize Milvus vector collections for semantic search (1024-dim embeddings) ● Build graph schemas in Neo4j for customer journey mapping and persona relationships ● Implement HNSW indexing strategies and similarity search optimization ● Create hybrid search systems combining vector, full-text, and graph queries ● Monitor and tune database performance (query latency, throughput, resource utilization)

ML Data Infrastructure (20%) ● Build data collection pipelines for LLM fine-tuning (conversation logs, tool executions) ● Create feature stores for GNN training (customer interactions, engagement signals) ● Implement data versioning and lineage tracking for ML experiments ● Design A/B testing data infrastructure with CUPED variance reduction ● Build real-time feature computation pipelines for contextual bandits

Analytics & Monitoring (15%) ● Design BigQuery schemas for marketing analytics and performance tracking ● Create materialized views and aggregation pipelines for real-time dashboards ● Implement data quality monitoring and anomaly detection ● Build observability infrastructure (Prometheus metrics, Grafana dashboards) ● Develop cost optimization strategies for cloud data warehousing

Technical Stack You'll Work With Databases & Storage ●

MongoDB

(conversation state, active sessions) ●

Redis

(caching, rate limiting, real-time data) ●

Milvus

(vector embeddings, semantic search) ●

Neo4j

(customer journey graphs, persona networks) ●

BigQuery

(analytics warehouse, historical data)

Data Processing & Orchestration ●

Apache Airflow

or

Prefect

(workflow orchestration) ●

Pandas ,

Polars

(data transformation) ●

Apache Spark

(optional - for large-scale processing) ●

dbt

(data transformation and modeling)

ML/AI Data Pipeline ●

vLLM

(LLM inference serving) ●

MLflow

(model registry, experiment tracking) ●

Sentence Transformers

(embedding generation) ●

PyTorch ,

TensorFlow

(ML model training)

Cloud & Infrastructure ●

Google Cloud Platform

(BigQuery, Cloud Storage, Compute) ●

Docker

&

Kubernetes

(containerization, orchestration) ●

Terraform

(infrastructure as code) ●

GitHub Actions

or

GitLab CI

(CI/CD pipelines)

Programming & Tools ●

Python 3.10+

(primary language) ●

SQL

(complex queries, query optimization) ●

Shell scripting

(Bash/Zsh) ●

Git

(version control)

Requirements Must-Have Skills ●

5+ years

of data engineering experience with production systems ●

Expert-level SQL

and database design skills ●

Strong Python

programming (async/await, type hints, testing) ● Experience with

at least 3 different database technologies

(SQL, NoSQL, Vector, Graph) ● Proven track record building

high-scale data pipelines

(>1M records/day) ● Deep understanding of

data modeling

(dimensional, normalized, denormalized) ● Experience with

cloud data warehouses

(BigQuery, Redshift, or Snowflake) ● Strong knowledge of

data quality, validation, and governance ● Excellent

debugging and optimization

skills

Highly Desirable ● Experience with

vector databases

(Milvus, Pinecone, Weaviate, Qdrant) ● Experience with

graph databases

(Neo4j, ArangoDB, Neptune) ● Knowledge of

embedding models

and semantic search ● Experience with

ML data pipelines

(feature stores, model training data) ● Understanding of

A/B testing

and experimental design ● Experience with

real-time streaming

(Kafka, Pub/Sub, Kinesis) ● Knowledge of

LLMs

and conversational AI systems ● Experience with

data migration

projects (especially large-scale) ● Background in

marketing technology

or customer data platforms Nice-to-Have

● Experience with

PyTorch Geometric

or graph neural networks ● Knowledge of

marketing analytics

(attribution, segmentation, personalization) ● Familiarity with

LangChain ,

LangGraph , or agent frameworks ● Experience with

cost optimization

in cloud environments ● Contributions to

open-source

data engineering projects ● Experience with

data compliance

(GDPR, CCPA)

Key Projects You'll Own Phase 1: Foundation ● Migrate 10M+ conversation vectors from Pinecone to Milvus ● Design and implement MongoDB schemas for real-time agent state ● Set up Neo4j graph database with customer journey models ● Create BigQuery data warehouse with partitioned tables

Phase 2: Optimization ● Build automated data quality monitoring system ● Implement caching strategies (Redis) for 10x latency reduction ● Optimize vector search queries (target:

● Create real-time analytics dashboards (Grafana)

Phase 3: ML Infrastructure ● Build LLM fine-tuning data pipeline ● Implement feature store for GNN training ● Create A/B testing data infrastructure ● Design multi-armed bandit state management

Work Environment ●

Collaborative team : Work with ML engineers, backend developers, and data scientists ●

Modern stack : Latest technologies and tools ●

Impact : Your work directly affects millions of marketing interactions ●

Autonomy : Own your projects end-to-end ●

Growth : Clear path to Senior/Lead/Principal roles

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now

Job Details

Posted Date: March 1, 2026

Job Type: Technology

Location: India

Company: OWOW

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now

Senior Data Engineer

Job Description

Ready to Apply?

Job Details

Ready to Apply?