Job Description
Senior Site Reliability Engineer
Location:
Remote (India- Offshore)
Job type:
Full time or Contract
Shift time:
3PM to 12AM IST
Total Experience: 6 + years
The Senior Site Reliability Engineer is responsible for the availability, performance, serviceability, and recoverability of production systems supporting flight operations, maintenance, and compliance workflows.
This role owns production reliability outcomes as systems scale, migrate, and evolve within regulated aviation environments.
What You Will Own
SQL to RDS migration experience
Experience with DMS or similar migration tools
Reliability Ownership and Service Health
Own availability, latency, throughput, and durability for production systems
Define and maintain service level indicators and service level objectives
Manage error budgets to guide engineering and operational decisions
Ensure reliability targets are met consistently
Production Architecture and Resilience
Design and operate highly available multi availability zone and multi region architectures
Ensure controlled and observable failure behavior
Define redundancy, graceful degradation, and automated recovery strategies
Validate failover and recovery through testing
Incident Response and Operational Maturity
Lead response to production incidents
Own root cause analysis focused on systemic contributors
Drive remediation actions to completion
Reduce incident frequency, severity, and blast radius over time
Observability and Operational Insight
Design centralized logging, metrics, alerting, and dashboards
Define observability standards tied to customer impact
Ensure alerts are actionable and low noise
Use operational data for capacity planning and scaling decisions
Automation and Toil Reduction
Identify and eliminate manual or repetitive operational tasks
Build automation to reduce operational risk
Standardize operational workflows
Treat simplicity as a reliability requirement
Data and Database Reliability
Own production database reliability
Design replication, backup, restore, and failover strategies
Validate recovery procedures regularly
Lead migrations to managed cloud databases such as AWS RDS or Aurora
Technical Qualifications
Cloud and Infrastructure
Hands on experience operating production systems on AWS or Azure
Strong understanding of networking, IAM, load balancing, and managed services
Ability to balance cost, reliability, and operational complexity
Distributed Systems
Experience operating distributed systems in production
Strong understanding of partial failure and recovery patterns
Ability to diagnose cross stack production issues
Observability and Operations
Experience with centralized logging, metrics, and alerting
Ability to design alerts based on service impact
Experience driving improvement from operational data
Programming and Automation
Strong scripting skills using Python, Node.js, or shell
Ability to write production grade operational tooling
Comfort modifying application code to improve reliability
Databases
Experience operating relational databases in production
Experience with replication, backup, restore, and failover
Experience migrating legacy databases to managed services preferred
Preferred Experience
Experience in regulated or safety critical industries such as aviation
Familiarity with compliance, auditability, and traceability requirements
Experience supporting systems with direct operational impact
Ready to Apply?
Don't miss this opportunity! Apply now and join our team.
Job Details
Posted Date:
March 18, 2026
Job Type:
Construction
Location:
India
Company:
Delta System & Software, Inc.
Ready to Apply?
Don't miss this opportunity! Apply now and join our team.