Job Description
Job Description:
We are a rapidly expanding AI startup that is transforming digital advertising with groundbreaking AI technology and innovative solutions. We are seeking a seasoned and visionary Principal Data Engineer & Tech Lead Manager to join our team and play a pivotal role in building and leading our data infrastructure that powers the future of advertising.
As a Principal Data Engineer & Tech Lead Manager, you will be responsible for architecting and leading the development of large-scale data platforms that process petabytes of consumer behavioral data. You will take ownership of the entire data ecosystem, from ingestion to transformation to serving, ensuring that our machine learning models have access to high-quality, reliable, and performant data pipelines. You will lead and manage a team of senior data engineers, setting technical direction, driving execution, and mentoring the team to deliver scalable solutions while balancing hands-on technical leadership with people management responsibilities.
The ideal candidate will be a highly experienced data engineering leader with deep expertise in building and managing petabyte-scale data systems. You should have a proven track record of leading data engineering teams, architecting lakehouse architectures, and implementing robust data platforms on cloud infrastructure (preferably AWS), with a focus on performance, reliability, and cost optimization.
Key Responsibilities:
●
Lead Data Platform Strategy:
Architect and implement a comprehensive data platform capable of processing and managing petabytes of consumer behavioral data
●
Build Scalable Data Infrastructure:
Design and build highly scalable, reliable lakehouse architecture on AWS, leveraging technologies such as Spark, Iceberg etc.
●
Team Leadership:
Lead, mentor, and manage a team of data engineers, fostering a culture of technical excellence, collaboration, and continuous improvement.
●
Technical Leadership:
Provide hands-on technical leadership, set engineering standards, conduct code reviews, and drive architectural decisions across the platform
●
Data Pipeline Development:
Oversee the design and implementation of robust ETL pipelines using Apache Airflow, Spark, and modern data orchestration frameworks
●
Performance Optimization:
Drive performance tuning initiatives for distributed computing workloads, including Spark optimization, query optimization in Trino etc.
●
Infrastructure as Code:
Champion infrastructure automation using Terraform, ensuring all data infrastructure is version-controlled, reproducible, and follows best practices
●
Kubernetes Operations:
Manage and optimize Spark workloads on Kubernetes (EKS), implementing autoscaling, resource management, and cost optimization strategies
●
Data Quality and Governance:
Establish and enforce data quality standards, validation frameworks, and data governance practices across the organization
●
Feature Engineering Support:
Collaborate with ML teams to build and optimize feature stores and feature data pipelines that support consumer behavior prediction models
●
Monitoring and Observability:
Implement comprehensive monitoring, alerting, and observability solutions for data pipelines and infrastructure using Prometheus etc.
●
Cross-functional Collaboration:
Work closely with ML engineers and leadership to and ensure the data platform meets evolving business needs
●
Technical Documentation:
Establish and maintain comprehensive technical documentation for architectures, processes, and best practices
●
Innovation and Research:
Stay current with emerging data technologies, evaluate new tools and techniques, and drive innovation in the data platform
●
Project & Delivery Management:
Oversee project planning, execution, and delivery timelines, ensuring the team meets commitments and delivers high-quality solutions
Qualifications:
● Bachelor's/Master's/PhD in Computer Science, Data Engineering, or a related field
● 7+ years in data engineering, including 2+ years leading data engineering teams
● Proven experience architecting and managing petabyte-scale production data systems
● Strong understanding of dimensional modeling and consumer data architectures.
● Expertise in Apache Spark (tuning/optimization on Kubernetes), lakehouse architectures (Apache Iceberg), Apache Airflow, and distributed query engines (Trino/Presto).
● Extensive experience with AWS (EKS, EC2, S3, IAM) and Kubernetes (deployment, autoscaling with Karpenter).
● Strong Infrastructure as Code (Terraform) and GitOps experience. Expert Python programming, PySpark, data processing libraries, and framework development. Experience with CI/CD for data infrastructure.
● Familiarity with data validation, schema management, and data quality systems.
● Experience with BI tools such as Apache Superset and JVM-based monitoring (JMX metrics, Prometheus exporters)
● Proven track record of leading and mentoring teams with strong people management and leadership skills. Excellent problem-solving, architectural skills for complex problems, and outstanding communication to technical and non-technical stakeholders.
● Experience with talent development and building high-performing engineering teams
Nice to have:
● Prior experience in AdTech or MarTech industries
● Experience with feature stores and feature engineering platforms
● Experience with machine learning model serving and inference infrastructure
● Prior experience working in a fast-paced startup environment
● Publications or contributions to open-source data engineering projects
What we offer:
We offer a competitive salary, comprehensive benefits, and a dynamic work environment. If you are passionate about building world-class data platforms that power AI-driven consumer insights, have strong leadership capabilities, and want to be part of a rapidly growing startup transforming the advertising industry, we encourage you to apply for this exciting opportunity!