Job Description
Role Summary
The Data Architect Leader is accountable for leading enterprise client engagements to build, modernize, and operationalize large-scale data platforms and data products in complex, regulated environments. This role combines hands-on architecture leadership with program-level guidance, shaping client strategy, target-state architectures, governance operating models, and multi-year transformation roadmaps across cloud, hybrid, and multi-cloud footprints.
You will serve as a senior trusted advisor to executives and delivery leaders—driving architecture decisions, guiding multiple workstreams/teams, establishing standards and reference implementations, and ensuring solutions are secure, scalable, cost-efficient, and measurable in outcomes, including enablement of GenAI and agentic capabilities on governed enterprise data.
Key Responsibilities
1) Client & Engagement Leadership
Lead end-to-end architecture for enterprise modernization programs (discovery → strategy → delivery → run), aligning business outcomes, technology decisions, and delivery plans.
Serve as executive-facing advisor, facilitating trade-off decisions across scope, risk, cost, timeline, and operating model. Lead architecture governance within engagements: architecture review boards, decision logs, standards enforcement, and exception management.
Shape “platform + product” delivery models that align data platform modernization with analytics and
AI / Agentic product
outcomes (time-to-value, adoption, risk posture).
2) Enterprise Data Platform Strategy & Roadmaps
Define target-state enterprise data architecture and multi-year transformation roadmaps (platform, operating model, governance, migration waves).
Establish and standardize architecture patterns for:
Ingestion (batch/streaming/event-driven)
Transformation (ELT/ETL, distributed processing)
Storage (lakehouse/warehouse, domain-oriented stores)
Serving (semantic layers, APIs, reverse ETL, activation)
Observability (quality, lineage, reliability, FinOps)
Define platform strategy that supports
AI-ready data
(data contracts, metadata completeness, lineage, quality SLOs, and curated “golden” datasets for AI use cases).
3) Solution Architecture & Reference Implementations
Produce high-quality artifacts and lead their adoption across teams:
Enterprise logical/physical data architecture
Domain/data product designs and contracts
Integration patterns (including ERP systems), data flows, and NFRs (availability, RPO/RTO, performance, cost)
Security architecture for data (identity, network boundaries, key management, encryption, secrets)
Define reusable reference architectures and accelerators (templates for pipelines, IaC patterns, CI/CD for data, quality checks, governance integration).
4) Governance, Security, Privacy & Compliance-by-Design
Define and enforce enterprise standards for:
Data modeling (conceptual/logical/physical), naming, domain boundaries, and data contracts
Data quality rules, SLA/SLO definitions, and issue management workflows
Metadata, cataloging, lineage, stewardship workflows, and auditability
Privacy/security controls: RBAC/ABAC, encryption, retention, masking/tokenization, consent (as applicable)
Architect governance platforms and operating models aligned to regulatory requirements (GDPR, CCPA, HIPAA as applicable), including controls evidence and audit readiness.
5) Master & Reference Data Leadership (MDM)
Lead design of MDM/reference data strategies: domain ownership, golden record patterns, survivorship rules, match/merge approaches, and stewardship workflows.
Define publication/consumption patterns for master/reference data across analytical and operational ecosystems.
6) GenAI & Agentic Solution Architecture
Lead architecture and solutioning for
GenAI and agentic use cases
on enterprise data (e.g., RAG, semantic search, agent-assisted analytics, automated data operations, customer/employee copilots). Establishing scalable patterns for GenAI on governed data.
Define end-to-end
LLMOps/AgentOps
practices: prompt/version management, evaluation harnesses, offline/online testing, monitoring, cost controls, rollback strategies, and production gating.
Architect agentic workflows using agent SDKs/frameworks (e.g.,
Google ADK ,
AutoGen ) including:
Tool/function calling patterns, planning vs. execution separation, memory strategies, and human-in-the-loop controls
Guardrails (policy checks, groundedness, citations/attribution patterns, refusal behavior), and safe tool access
Guide platform adoption and reference implementations leveraging:
Databricks AI capabilities
(e.g.,
Mosaic AI , model serving, governed feature/embedding pipelines, agent/assistant patterns,
Agent Bricks / packaged agent accelerators
where relevant to the client ecosystem)
Snowflake AI capabilities
(e.g.,
Cortex
and native AI/ML services for enterprise workloads)
Design
knowledge graph / semantic layer strategies
to improve retrieval quality and reasoning (entity resolution, ontologies/taxonomies, relationships, graph + vector hybrid patterns, and governance of business definitions).
Strong understanding of LLM/GenAI risks and controls: PII handling, prompt injection risks, data exfiltration controls, safe retrieval patterns, and evaluation/monitoring.
7) People Leadership, Mentorship & Practice Building (if applicable)
Mentor and coach architects and senior engineers; establish architecture career paths, skill standards, and review processes.
Contribute to practice capabilities: reusable assets, playbooks, reference architectures, and internal enablement (brown bags, training).
Support pre-sales and solutioning: discovery workshops, proposals/SOW input, architecture in RFP responses, and delivery approach definition.
Build repeatable offerings (platform modernization + GenAI enablement packages) with clear scope, outcomes, and implementation patterns.
Required Skills & Experience
15+ years in data engineering/architecture with significant enterprise-scale platform design and modernization experience.
Demonstrated leadership of multi-team, multi-workstream data programs (including distributed/onshore-offshore and/or multi-vendor environments).
Deep expertise in modern data architectures and patterns:
Batch + real-time streaming/event-driven (e.g., Spark, Kafka)
Warehousing and lakehouse (e.g., Snowflake, Databricks/Delta Lake, BigQuery, Redshift, Synapse)
Data product and federated/data mesh-aligned thinking (with pragmatic governance)
Strong cloud architecture depth in at least one major provider (AWS/Azure/GCP), including networking, identity, and security patterns for data platforms.
Strong knowledge of storage formats and table technologies (Parquet/ORC; Delta/Iceberg/Hudi).
Proven ability to implement or guide ETL/ELT and orchestration patterns (Airflow, dbt, Dataflow, Glue, ADF, NiFi, etc.).
Hands-on knowledge of metadata/catalog/lineage solutions (e.g., AWS Glue Data Catalog, Microsoft Purview, Databricks Unity Catalog or equivalents).
Executive-level communication skills: can lead architecture trade-offs clearly to technical teams and business leadership.
Preferred Qualifications
BS/MS in Computer Science, Data Engineering, or related field (or equivalent practical experience).
Certifications (preferred): AWS Solutions Architect / Data Analytics, Google Professional Data Engineer, Azure Data Engineer, Databricks/Snowflake.
Experience with hybrid/multi-cloud and large migration programs (warehouse-to-lakehouse, on-prem to cloud, Hadoop modernization, etc.).
Data observability and quality frameworks (Great Expectations, Deequ) plus end-to-end monitoring/SLO practices.
Strong CI/CD + DataOps patterns (IaC, environment promotion, testing, policy-as-code); MLOps integration patterns a plus.
Semantic layer / BI enablement experience (Power BI, Tableau, Looker) for governed enterprise reporting.
Hands-on experience with one or more AI suites/platforms:
Gemini Enterprise ,
Databricks Mosaic AI ,
Snowflake Cortex
(or equivalent).