Descriรงรฃo da Vaga
100% Remote
USA Timezone
Contractor / PJ position
Role Overview
The goal is to shape the reliability and scalability of mission-critical platforms on Azure, Kubernetes, and modern DevOps toolchains. You will solve complex infrastructure challenges, automate end-to-end operations, and ensure systems operate with high availability, performance, and security.
What You Will Do
- Design, build, and improve CI/CD pipelines for applications and infrastructure
- Develop automation frameworks that reduce manual effort and increase consistency.
- Configure and optimize cloud infrastructure to align with security, scalability, and performance best practices.
- Collaborate with development teams to remove deployment blockers and improve delivery workflows.
- Monitor reliability and performance, identify issues early, and implement data-driven improvements to increase uptime and efficiency.
- Participate in on-call rotations and drive incident resolution with clear postmortems and preventive actions.
- Maintain technical documentation for pipelines, configurations, and runbooks.
- Perform readiness assessments and validation tests before production rollouts.
- Implement Infrastructure as Code using Terraform and ARM templates with version control and reproducibility.
- Troubleshoot complex deployment, provisioning, and performance issues across multi-cloud and containerized environments.
Minimum Qualifications
- Proven track record in SRE or DevOps roles operating production systems
- Hands-on experience running production workloads on Kubernetes in a cloud environment, including cluster design, autoscaling, upgrades, and network policies.
- Proven CI/CD delivery using GitHub Actions or Jenkins, including promotion across environments, approvals, and rollback strategies.
- Infrastructure as Code expertise with Terraform and ARM templates, including modules, remote state, workspaces, and policy enforcement.
- Strong scripting in PowerShell, Bash, or Python for automation and diagnostics.
- GitOps experience with Argo CD or Flux, managing multi-environment application delivery and drift remediation.
- Containerization with Docker and Kubernetes, including health probes, PodDisruptionBudgets, resource quotas, HorizontalPodAutoscaler, and operators.
- Networking fundamentals with cloud network security practices such as VNet design, NSGs, Private Link, and ingress controllers.
- Working knowledge of cloud security and compliance, including least privilege, secrets management, audit trails, and control evidence.
- Excellent written and spoken English.
- Ability to collaborate across US time zone.
Preferred Qualifications
- Microsoft Azure certification, such as Developer Associate, Administrator, or DevOps Engineer Expert
- Observability using Application Insights, Elastic Stack (ELK), Grafana, and Prometheus for metrics, logs, and traces.
- Experience with log aggregation and alerting at scale using Elastic and Prometheus.
- Understanding of high availability, scalability, disaster recovery, and cost optimization strategies.
- Experience managing Windows-based containerized applications.
Plus
- Experience in Google Cloud Platform (GCP) or AWS