Job Description
Job Summary
We are UMG, the Universal Music Group. We are the world’s leading music company. In everything we do, we are committed to artistry, innovation and entrepreneurship. We own and operate a broad array of businesses engaged in recorded music, music publishing, merchandising, and audiovisual content in more than 60 countries. We identify and develop recording artists and songwriters, and we produce, distribute and promote the most critically acclaimed and commercially successful music to delight and entertain fans around the world.
Job Functions
System Reliability & Performance
Design, build, and maintain the availability, scalability, and performance of critical services.
Develop and maintain robust monitoring, alerting, and observability systems (e.g., using AWS CloudWatch, Dynatrace) to ensure rapid issue detection and resolution.
Monitor infrastructure capacity and performance, providing analysis and suggestions for service delivery improvement.
Automation & Efficiency
Drive the automation of repetitive operational tasks, including infrastructure provisioning, deployments, and scaling.
Create and maintain scripts and custom code to support and enhance our operational toolset.
Support and optimize CI / CD pipelines to improve deployment speed and reliability.
Incident Management & Collaboration
Collaborate with global peers to troubleshoot and mitigate production incidents.
Lead post-incident reviews and root cause analyses to implement lasting solutions.
Partner with engineering and IT stakeholders to embed SRE best practices (SLOs, error budgets) into the design and development lifecycle.
Job Requirements
Required Experience & Skills
A strong background in systems administration (Linux / Windows) in a large-scale environment.
Proficiency in at least one programming language (e.g., Python, Go, Java).
Hands‑on experience with a major cloud platform (AWS, GCP, or Azure), with a high preference for AWS.
Solid understanding of networking, containers (Docker, Kubernetes), and Infrastructure as Code (e.g., Terraform, Ansible).
Experience with modern monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, Dynatrace).
Proven analytical and problem‑solving abilities with experience in a high‑pressure environment.
Excellent communication skills and the ability to foster a collaborative team environment.
You will be part of a Mon‑Sun roster, where office work is required on weekends.
Preferred Experience & Skills
Bachelor's degree in an IT‑related field.
Experience managing large‑scale, distributed systems for a global organization.
Familiarity with IT governance standards like ITIL.
Direct experience with ServiceNow for IT service management.
Knowledge of chaos engineering, resilience testing, and advanced capacity planning.
Universal Music Group is an Equal Opportunity Employer.
Diversity & Inclusion
At Universal Music we are committed to fostering diversity and inclusivity as an equal opportunity employer. We encourage applicants from all backgrounds to apply for our roles regardless of their gender, race, ethnicity, nationality, age, sexual orientation, gender identity, intersex status, marital or family status, neurodiversity, religion or belief, disabilities, or socio‑economic background. We also encourage people from all cultural backgrounds to apply, including First Nations people. It is through our diversity and inclusivity that we bring together different perspectives, enhancing our creative and evolving workplace. Music is Universal.
Disclaimer
The company presents this job description as a guide to the major areas and duties for which the jobholder is accountable. However, the business operates in an environment that demands change and the jobholder's specific responsibilities and activities will vary and develop. Therefore, the job description should be seen as indicative and not as a permanent, definitive and exhaustive statement.
Job Category : Technology
#J-18808-Ljbffr