Job Description
Job Description
We are looking for a skilled
Data Engineer
with strong experience in
Apache Spark
to design, build, and optimize large-scale data pipelines in a distributed environment. The ideal candidate has hands-on expertise in modern data engineering practices, cloud platforms, and scalable data processing frameworks.
Key Responsibilities
Design, develop, and maintain
ETL/ELT pipelines
using
Apache Spark
(batch and/or streaming).
Build and optimize distributed data processing workflows on Spark (PySpark/Scala/Java).
Work with cloud-based data ecosystems (AWS, GCP, or Azure) to develop scalable data solutions.
Collaborate with data scientists, analysts, and backend engineers to deliver reliable, high‑quality data products.
Implement and maintain data quality checks, monitoring, and alerting for data pipelines.
Optimize Spark jobs for performance, cost efficiency, and scalability.
Manage and model data in data lakes, data warehouses, and/or structured storage systems.
Contribute to data architecture design, including schema modeling, partitioning, and data lifecycle management.
Automate infrastructure and pipeline deployments using CI/CD and IaC frameworks.
Ensure compliance with data governance, security, and privacy standards.
Required Skills & Qualifications
Strong hands-on experience with
Apache Spark
(batch or streaming).
Proficiency in
Python ,
Scala , or
Java
for data processing.
Experience with at least one cloud platform (AWS, GCP, or Azure).
Solid understanding of distributed systems, data partitioning, and performance tuning.
Hands-on experience with data lake technologies (e.g., S3, GCS, Azure Data Lake).
Experience with relational databases and SQL.
Familiarity with CI/CD workflows and version control (Git).
Experience with Infrastructure-as-Code tools (Terraform, CloudFormation, etc.) is a plus.
Knowledge of workflow orchestration tools such as Airflow, Dagster, or Prefect.
Strong problem‑solving skills and ability to work in cross‑functional teams.
Preferred Qualifications (Optional)
Experience with Spark on
Kubernetes ,
Databricks ,
EMR , or
Dataproc .
Knowledge of streaming technologies (Kafka, Pub/Sub, Kinesis).
Familiarity with Delta Lake, Iceberg, or Hudi.
Background in data modeling (ELT/ETL design, star/snowflake schemas).
Experience with real‑time and near‑real‑time data pipelines.
Ready to Apply?
Don't miss this opportunity! Apply now and join our team.
Job Details
Posted Date:
March 18, 2026
Job Type:
Technology
Location:
India
Company:
Google
Ready to Apply?
Don't miss this opportunity! Apply now and join our team.