Job Description
Exp: 5 to 8 yrs
Skills: Scala + Spark
Hands‑on technical lead responsible for designing, developing, optimizing and stabilizing core Scala‑Spark data pipelines while mentoring junior engineers and ensuring delivery quality.
Core Responsibilities (Scala + Spark Delivery • Hands‑On Ownership)
• Design and implement Scala + Spark pipelines using Dataset/DataFrame APIs with strong emphasis on typed, performant, and modular code.
• Translate functional requirements into efficient transformations, ingestion logic, and data models using best‑practice Scala design patterns.
• Build reusable libraries/utilities for data parsing, validation, transformation, and Spark job orchestration.
• Analyze Spark jobs using Spark UI, event logs, and metrics to identify bottlenecks such as skew, shuffles, and spills.
• Apply optimization techniques such as broadcast joins, partitioning strategies, file‑size tuning, caching, and minimizing wide transformations.
• Ensure robust data handling with checkpointing/recovery logic (if streaming adoption is part of the project).
• Follow and enforce engineering standards for Scala coding, functional purity, immutability, type‑safety, naming conventions, and error‑handling.
• Participate in code reviews, ensuring high quality, maintainability, and production readiness.
• Work with testing teams to define unit, integration, and regression test coverage for pipelines and utility modules.
• Support sprint planning, estimation, technical grooming, and production migration activities.
• Collaborate with architects, product owners, QA, operations/SRE, cloud platform teams and other dependent systems.
• Drive troubleshooting and root‑cause analysis for issues encountered across environments.
Must‑Have Technical Skills
• Strong grasp of Scala fundamentals: collections, pattern‑matching, functional constructs, immutability, error‑handling (Option/Try/Either), APIs, and modular code design.
• Experience writing reusable Scala functions, case‑class‑based models, and typed Dataset operations.
• Hands‑on experience with Spark Core, Spark SQL, Spark Datasets, Spark optimization, and understanding of execution plans (explain).
• Knowledge of Catalyst optimizer basics and ability to interpret query plans.
• Understanding of shuffles, partitions, caching, broadcast joins, and narrow/wide transformations.
• Strong SQL (joins, window functions, incremental logic, aggregations).
• Knowledge of schema evolution, data modeling for analytical pipelines, and modern lakehouse table formats (Delta/Iceberg/Hudi).
Deliverables & KPIs
• High‑quality Scala‑Spark modules, utilities, and transformation pipelines.
• Readable, maintainable code with supporting test suites.
• Design notes, runbooks, performance notes, and environment‑specific tuning recommendations.
• Code Quality: Low defects, high code review acceptance, strong test coverage.
• Performance: Reduced job runtimes, minimized shuffle volumes, predictable SLA behavior.
• Delivery: On‑time module completion; smooth integration with upstream/downstream components.
Interview Focus / Practical Evaluation
• Hands‑on coding in Scala: transformations, pattern matching, Option/Try/Either usage, functional vs imperative differences.
• Ability to identify issues in sample Scala code and propose improvements.
• Reading an execution plan and describing join strategy, partition usage, and optimization opportunities.
• Scenario questions about handling skew, immutable transformations, and minimizing shuffles.