Job Description

Job Description We are looking for a skilled

Data Engineer

with strong experience in

Apache Spark

to design, build, and optimize large-scale data pipelines in a distributed environment. The ideal candidate has hands-on expertise in modern data engineering practices, cloud platforms, and scalable data processing frameworks.

Key Responsibilities Design, develop, and maintain

ETL/ELT pipelines

using

Apache Spark

(batch and/or streaming). Build and optimize distributed data processing workflows on Spark (PySpark/Scala/Java). Work with cloud-based data ecosystems (AWS, GCP, or Azure) to develop scalable data solutions. Collaborate with data scientists, analysts, and backend engineers to deliver reliable, high‑quality data products. Implement and maintain data quality checks, monitoring, and alerting for data pipelines. Optimize Spark jobs for performance, cost efficiency, and scalability. Manage and model data in data lakes, data warehouses, and/or structured storage systems. Contribute to data architecture design, including schema modeling, partitioning, and data lifecycle management. Automate infrastructure and pipeline deployments using CI/CD and IaC frameworks. Ensure compliance with data governance, security, and privacy standards.

Required Skills & Qualifications Strong hands-on experience with

Apache Spark

(batch or streaming). Proficiency in

Python ,

Scala , or

Java

for data processing. Experience with at least one cloud platform (AWS, GCP, or Azure). Solid understanding of distributed systems, data partitioning, and performance tuning. Hands-on experience with data lake technologies (e.g., S3, GCS, Azure Data Lake). Experience with relational databases and SQL. Familiarity with CI/CD workflows and version control (Git). Experience with Infrastructure-as-Code tools (Terraform, CloudFormation, etc.) is a plus. Knowledge of workflow orchestration tools such as Airflow, Dagster, or Prefect. Strong problem‑solving skills and ability to work in cross‑functional teams.

Preferred Qualifications (Optional) Experience with Spark on

Kubernetes ,

Databricks ,

EMR , or

Dataproc . Knowledge of streaming technologies (Kafka, Pub/Sub, Kinesis). Familiarity with Delta Lake, Iceberg, or Hudi. Background in data modeling (ELT/ETL design, star/snowflake schemas). Experience with real‑time and near‑real‑time data pipelines.

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now

Job Details

Posted Date: March 18, 2026

Job Type: Technology

Location: India

Company: Google

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Apply Now

Apache Spark/Airflow Data Engineer

Job Description

Ready to Apply?

Job Details

Ready to Apply?