Job Description
Talent500 is hiring for one of its clients.
Job Title : Senior Site Reliability Engineer
Location : Bangalore, India
Hiring Manager : Director Engineering, Site Leader
About Recast:
Recast Software, located in Minneapolis, MN, empowers organizations to better manage and support users and devices. Our mission is to simplify the work of IT teams and enable them to create highly secure and compliant environments. Our software does this by seamlessly integrating with existing IT infrastructure to quickly remediate issues, ensure compliance, enhance security, and maintain clear visibility across all devices. Recast is a rapidly growing software company with its solution being used by thousands of enterprise organizations in more than 125 countries, impacting millions of devices and (more importantly) the people who use them.
About the Role:
As a Sr Site Reliability Engineer, you will be responsible for ensuring the reliability of our Azure-hosted services. You will define Service Level Indicators (SLIs) and Service Level Objectives (SLOs), strengthen our infrastructure using Infrastructure as Code (IaC), and enhance the safety and speed of our releases through Azure DevOps.
Success in this role will be measured by the attainment of SLOs and the effective use of error budgets. You will be expected to reduce MTTR and manual toil through automation, and to improve cost efficiency without compromising reliability.
This is a remote-first opportunity, reporting to our DevOps, Security, and IT Manager. The role requires participation in a fully on-call schedule, providing 24/7 coverage for our SaaS services and backend infrastructure. You will work closely with Engineering, Security, IT, and Support.
What You’ll Do:
Maintain service availability and ensure that Service Level Agreements (SLAs) are consistently met.
Manage error budgets and SLOs.
Expand our observability capabilities by improving metrics, logs, and traces, as well as implementing alerts and auto-remediation processes.
Lead incident response and conduct post-incident reviews to reduce Mean Time to Recovery (MTTR).
Plan for capacity, manage performance, and optimize cloud costs.
Collaborate across North American and European teams, create and enhance documentation, and share & demonstrate best practices.
Minimum Requirements:
5+ years of experience in Site Reliability Engineering or DevOps within cloud environments, with a preference for Azure.
2+ years of experience building and operating CI/CD pipelines using tools such as Azure DevOps (preferred), Jenkins, or GitHub Actions.
3+ years of experience with cloud monitoring, alerting, and automation.
Proficiency in PowerShell (preferred) or Python.
Experience with Infrastructure as Code tools like Bicep (preferred) or Terraform.
Understanding of configuration management fundamentals.
Exposure to cloud networking components such as load balancers, web application firewalls, and firewalls.
Experience with DORA metrics and progressive delivery practices.
Experience in managing SQL/No-SQL Infrastructure with general query knowledge
Experience working in a cloud first infrastructure, especially Azure, with cloud services (e.g. SignalR, Redis, API Mangement, etc.).
Preferred Knowledge and Skills:
Experience in Azure Monitoring, App Insights, and other Azure Services
Experience in services such as Datadog, Splunk, etc.
Familiarity with a typed language such as C# or C++ is preferred.
Experience with query languages (SQL, Spark, KQL).
What you bring:
You take initiative.
We have a culture of ownership and progress over perfection. We proactively drive outcomes with self-motivation and determination. We deliver results that matter to our customers and our team.
You get curious.
Curiosity moves us forward. We ask questions, try new things, and learn from mistakes. Challenges are opportunities to explore creative solutions that benefit our customers and drive continuous improvement.
You work together.
We appreciate the power of diverse perspectives. Through open communication, we help one another and leverage our collective expertise for better outcomes. We build trust through teamwork.
You embrace change.
Change is inevitable; we meet it with agility and resilience. We navigate with courage and find possibility in uncertainty. We adapt for the future, shaping our path with purpose.
You choose empathy.
We aim to deeply understand the needs of our customers and one another - it's the foundation of our relationships. We assume positive intent and practice mutual respect. We prioritize a culture of belonging because success is a shared journey.
Compensation, Benefits, & Perks:
TBD
Why do we love working at Recast?
It takes great people across an entire company to build great tools. As a growing start-up, every employee has an opportunity to make a huge impact on our business, as well as ample opportunities to learn and grow. We are a people-first culture with passionate, talented, and supportive teammates. We are committed to making every employee feel respected and valued. We recognize to bring our best selves to Recast, it’s important for everyone to nurture their lives outside of work.
Recast provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, creed, gender, sex (including pregnancy, childbirth, and related medical conditions), sexual orientation, gender identity, national origin, age, disability, genetic information or characteristics, marital status, familial status, veteran or military status, status regarding public assistance, membership or activity in a local commission, or any other protected status in accordance with applicable federal, state, local, and international laws.