Home Job Listings Categories Locations

Service Delivery Manager - Enterprise Support (Remote)

📍 Brazil

Negócios e Operações Deep.BI

Descrição da Vaga

About the company

We provide

enterprise support and consulting

for open‑source analytics and data infrastructure platforms such as

Apache Druid, Apache Flink, StarRocks

and other emerging technologies.

Our customers run

mission‑critical, high‑volume systems

and rely on us to keep them fast, stable, and available. We’re a small, world-class expert, remote‑first team working across multiple time zones (US, Brazil, Europe, India, Philippines), supporting

100+ customer environments

with SLAs ranging from advisory support to 24/7 incident coverage.

About the role

We’re looking for an experienced

Service Delivery Manager

to take ownership of our service operations: SLAs and incident processes on‑call and skills coverage SOPs and first‑line/SRE enablement configuration management SLA metrics and reporting and coordination between customers and our engineering teams. This is a

hands‑on role

, not a pure governance role. You will be close to real incidents, engineers, and customers and you’ll be expected to bring in practices you’ve already used successfully in previous service or managed‑services environments.

What you’ll do1. Service operations, on‑call & incidents

Design and maintain an

on‑call and coverage plan

that ensures all critical skills are available when needed (initially weekdays, evolving to full 24/7 where required). Own the

incident management process

for your accounts: priorities, roles, communication cadence, escalations, and post‑incident reviews. Define and monitor key

service metrics

(e.g., MTTA, MTTR, SLA compliance, backlog health) and drive improvements based on them. Act as incident lead / coordinator during major incidents, keeping engineers focused and customers informed.

2. SOPs, runbooks & first‑line enablement

Create and maintain

SOPs, runbooks, and triage guides

for SRE engineers, covering common incident types and operational tasks. Train and coach first‑line/SRE teams so they can confidently handle

initial triage, basic troubleshooting, and clear communication

, escalating only when needed. Continuously refine documentation based on real incident experience and feedback.

3. Configuration management & readiness

Establish and run a

configuration management process

that keeps track of each customer’s environment (platforms in use, clusters, regions, configs, access, monitoring, key contacts). Proactively close information gaps by working directly with customers and engineers. Ensure configuration information is available and trustworthy during incidents and for onboarding new engineers.

4. Customer communication & governance

Be the

primary operational contact

for a set of enterprise customers. Lead

regular service reviews and status calls

, presenting SLA performance, key incidents, risks, and improvement actions. Present and agree on the

incident management process

with customers (channels, priorities, escalation paths, expectations). Work closely with Account Management / Sales on renewals, expansions, and expectation management.

5. Commercial & delivery management

Clarify

what is in scope

vs. out of scope and work with customers and Sales to shape

paid change requests

when additional work is needed. Monitor

effort vs. contract

, help protect margins, and flag risks early (under‑scoped contracts, chronic over‑use, under‑utilized capacity). Work in a

matrix environment

, coordinating with different technical teams (e.g., database engineering, DevOps, SRE) to staff and deliver engagements effectively.

6. Onboarding & training

Design and maintain

onboarding paths

for new engineers joining support/delivery (shadowing, training on SOPs, environment overviews, “certification” on certain incident types). Ensure new team members reach a productive, independent state quickly and safely.

What success looks like in 6–12 months

On‑call coverage is

clear, predictable, and sustainable

; engineers know when they’re on and what’s expected. First‑line/SREs handle a meaningful share of incidents

without escalation

, using well‑maintained runbooks. You can open a customer’s configuration, see an accurate picture, and use it during incidents and planning. SLA and incident metrics are

tracked, reported, and discussed

regularly with customers and internally. Customers have a clear understanding of

how incidents are handled

and feel confident in the process. New engineers ramp up faster thanks to structured onboarding and training.

You’ll be a great fit if you have

5+ years

in a

Service Delivery, Managed Services, IT Operations, or Enterprise Support

role serving

external customers

(not only internal IT). Experience with

24/7 or extended‑hours operations

, including on‑call or follow‑the‑sun setups. Hands‑on experience with

incident management

and ITSM practices (incident/problem/change), ideally in an ITIL‑inspired environment. A track record of

creating or improving SOPs/runbooks

and training first‑line / SRE teams. Experience maintaining

configuration / environment data

for customer systems. Comfort discussing technical topics with engineers (cloud, distributed systems, data platforms) and explaining them in clear business terms to customers. Experience in

commercial delivery

: scope boundaries, change requests, effort vs. revenue, working alongside Sales / Account Management. Strong communication skills in English, both written and spoken.

Nice to have Background with

data, analytics, or streaming platforms

(e.g., Druid, Kafka, Flink, StarRocks, ClickHouse, TiDB, Hadoop, cloud data warehouses). Experience working in small, fast‑moving, remote teams.

Location & working style

Remote‑first

- we collaborate online across multiple time zones. Role requires regular overlap with European and North American business hours. We are flexible on contract structure (direct employment or via a global payroll partner or contractor/B2B), depending on your location and preference.

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.

Detalhes da Vaga

Data de Publicação: December 19, 2025
Tipo de Vaga: Negócios e Operações
Localização: Brazil
Company: Deep.BI

Ready to Apply?

Don't miss this opportunity! Apply now and join our team.