Job Description
Lead Data Engineer
Location & Work Model
Location: Remote
Work Model: Full-time Timings :8AM IST to 5 PM IST
Job Summary
We are seeking a highly experienced
Senior Cloud Data Engineer
with 8-10+ years of hands-on experience in designing, building, and managing large-scale data platforms and pipelines in cloud environments. The ideal candidate will have strong expertise in
AWS, Snowflake, PySpark, and ETL frameworks , along with proven leadership experience in delivering scalable, secure, and high-performance data solutions. This role requires close collaboration with business stakeholders, cross-functional teams, and mentoring junior engineers while driving data-driven decision-making.
Key Responsibilities
Design, develop, and maintain
scalable cloud-based data pipelines
using AWS services such as
S3, EMR, Glue, Lambda, Kinesis , and Snowflake.
Lead end-to-end
data ingestion, transformation, and orchestration
workflows using
PySpark, Apache Airflow, Kafka, and ETL tools .
Architect and optimize
data warehousing solutions
on Snowflake, Hive, Redshift, and HDFS for large-volume and high-performance analytics.
Manage and execute
on-prem to cloud data migrations , ensuring minimal downtime, cost optimization, and improved scalability.
Implement and enforce
data governance, security, audit logging, and compliance
standards across data platforms.
Develop and optimize
PySpark jobs
for data ingestion into Snowflake, Hive, and HBase tables.
Monitor and tune system performance, including query optimization, error handling, and recovery mechanisms.
Collaborate with BI and analytics teams to support reporting solutions using
Power BI and Tableau .
Lead and mentor a team of data engineers, conducting code reviews and promoting best practices.
Work in an
Agile/Scrum environment , participating in sprint planning, POCs, and continuous improvement initiatives.
Coordinate with cross-functional teams and stakeholders to deliver business-aligned data solutions.
Required Skills & Experience
08–10+ years
of experience in Data Engineering with strong exposure to cloud-based data platforms.
Strong hands-on experience with
AWS (EMR, Glue, Lambda, S3, Kinesis, ECS) .
Expertise in
Snowflake (development and administration) .
Advanced proficiency in
Python and PySpark , including data structures and distributed processing.
Solid experience with
ETL tools
such as Informatica PowerCenter, Informatica BDM/BDE, Alteryx, and DBT.
Strong knowledge of
Big Data technologies : Hadoop, Hive, HDFS, HBase, Spark.
Experience with
real-time data processing
using Kafka, ActiveMQ, and Spark Streaming.
Proficiency in
SQL and databases : Oracle, Hive, Snowflake, Redshift, Netezza, Sybase.
Hands-on experience with
job scheduling tools : Airflow, Control-M, Autosys, Tidal, Cron.
Experience in
performance optimization, data validation, audit logging, and error handling .
Exposure to
Banking, Investment, or Financial Services domains
is a strong plus.
Preferred Qualifications
AWS Certified Solutions Architect – Associate
AWS Certified Developer – Associate
Experience working with
BI tools
such as Power BI and Tableau
Exposure to
Databricks
and modern analytics platforms
Strong stakeholder management and leadership skills
Ideal Candidate Profile
The ideal candidate is a
hands-on technical leader
who can balance architecture, development, and team leadership. You should be comfortable working in fast-paced environments, handling complex data challenges, and delivering reliable, business-ready data solutions at scale.