Job Description
Dear Candidate,
We are Hiring for Azure Databrick for Hyderabad Location.
Please help me with below detail along with you update resume.
Note Screenselected resumes are Scheduled
Name
Contact Number
Email ID
Current Location
Preferred Location
Total Experience
Relevant Experience
Current Organisation
CurrentCTC
Expected
Notice Period
Timing
Date
Experience:
5+ Years
Location:
[Onsite/Hybrid/Remote]
Employment Type:
Full-time
Role Overview
We’re seeking a detail-oriented
L1 Data Engineering (Databricks) Engineer
to provide
first-line support
for data pipelines and jobs running on
Databricks . You will be responsible for
monitoring ,
incident response ,
job reruns ,
SLA adherence , and
operational hygiene . The ideal candidate has solid exposure to
PySpark ,
Delta Lake ,
Azure Data Lake Storage , and
Databricks Workflows/Jobs , with a strong focus on
stability
and
service continuity .
Key Responsibilities
Monitor & Support Pipelines:
Monitor
Databricks Jobs/Workflows , ADX/ADF pipeline runs, and streaming sources.
Proactively detect failures, lag, and SLA breaches; perform
first-line triage .
Incident Management:
Acknowledge incidents, classify severity (SEV levels), and follow
ITIL-based process .
Execute
runbooks , perform
safe reruns , and handle
partial reprocessing .
Escalate to L2/L3 with detailed
incident documentation
(logs, job IDs, inputs/outputs).
Operational Tasks:
Validate data availability and quality at key checkpoints ( bronze/silver/gold
layers).
Manage
ad hoc fixes
(e.g., small Spark config changes, partition reruns).
Maintain
metadata ,
service accounts , and
token rotations
as per SOPs.
Access & Governance:
Basic administration in
Databricks : cluster start/stop, job scheduling checks, permissions requests, workspace hygiene.
Work with
Unity Catalog
permissions and
Key Vault
integrations under guidance.
Documentation & Reporting:
Update
runbooks ,
Known Error Database (KEDB) , and
SOPs .
Publish daily/weekly
ops reports , SLA metrics, and post-incident summaries.
Collaboration:
Coordinate with Data Engineers, Platform Engineers, Security, and Product Teams.
Participate in
release readiness
and
operational acceptance
for new pipelines.
Required Skills & Qualifications
5+ years
in data operations/support (L1/L1.5), preferably on
Databricks + Azure .
Hands-on with:
Databricks Jobs/Workflows ,
Clusters ,
Repos ,
Delta Lake
basics.
Azure Data Lake Storage (ADLS) ,
Azure Data Factory (ADF)
monitoring.
PySpark
basics: reading/writing Delta/Parquet, partitioning, checkpoints.
Git
(GitHub/Azure Repos) for config, notebooks, and runbook versioning.
Strong grasp of
observability :
Reading
Spark UI ,
job logs ,
driver/executor logs .
Experience with
Log Analytics ,
Azure Monitor ,
App Insights
(preferred).
ITIL/Service Management
familiarity: Incident, Change, Problem, Knowledge.
Scripting
(Bash/PowerShell/Python) for small automation tasks.
Excellent
communication ,
documentation , and
shift handover
discipline.
Nice-to-Have
Exposure to
Structured Streaming ,
Kafka/Event Hub
monitoring.
Basic
SQL
for data validation and health checks.
Understanding of
Unity Catalog
data governance, lineage, and entitlement workflows.
Experience with
Secrets/Key Vault ,
Managed Identity , and
RBAC .
Experience with
CI/CD
for Databricks (GitHub Actions/Azure DevOps) for deployments.
Certifications (Preferred)
Databricks Certified Data Engineer Associate
Microsoft Certified: Azure Data Engineer Associate (DP-203)
ITIL Foundation
(v3/v4)
Key Performance Indicators (KPIs)
SLA Adherence : % of jobs meeting SLA; mean time to acknowledge (MTTA).
Incident Metrics : Mean time to resolve (MTTR), incident reopen rate.
Operational Hygiene : Runbook completeness, KEDB updates, shift handover quality.
Quality Metrics : Error rates, number of successful reruns without escalation.
Proactive Monitoring : Number of issues prevented via early detection/alerts.
Change Readiness : Zero-defect deployments from ops perspective.
Tools & Ecosystem
Databricks
(Jobs, Clusters, Repos, Workflows, Unity Catalog)
Azure : ADLS, ADF, Key Vault, Event Hub, Log Analytics, Monitor
Version Control : GitHub / Azure Repos
Ticketing : ServiceNow / Jira / Azure DevOps Boards
Observability : Azure Monitor, Log Analytics, Grafana (optional)
Sample Interview Screening Topics (for Hiring Teams)
Ops Scenarios : How to triage a failing Databricks job (OOM, shuffle spill, auth error).
Logs & Spark UI : Identify cause from executor logs; interpret stages/tasks/shuffles.
Data Validation : Checkpoint integrity, bronze→silver load verification.
Runbooks : Steps to safely rerun a partitioned pipeline without duplicate writes.
Access/Governance : Handling a permission issue with Unity Catalog tables.
SLA & Escalation : When to escalate vs. when to rerun; SEV classification.
JD Summary (Short Version for Job Portals)
Role:
L1 Data Engineering (Databricks) Engineer — 5+ years
Must-have:
Databricks Ops, ADF monitoring, ADLS, PySpark basics, ITIL, incident management
Nice-to-have:
Unity Catalog, Azure Monitor, CI/CD, streaming
Shift:
Rotational/on-call
Certs:
Databricks DE Associate, DP-203, ITIL Foundation (preferred)
Durga Karunakaran
TAG Team - HCL Technologies Ltd.
Chennai, India
Durga Karunakaran | LinkedIn