Azure DevOps Engineer

Senior DevOps / SRE Engineer (Contractor)

SeniorContract - 6 Months

Fully remote (US / LATAM)

About the Engagement

We are seeking a Senior DevOps / SRE Engineer to take ownership of CI/CD pipelines, GitOps infrastructure, Kubernetes operations, and reliability engineering practices supporting a multi-service production platform. This role is critical in enabling safe, frequent deployments and ensuring rapid, structured recovery when incidents occur.

You will work closely with Platform Engineering, Data Platform, and Product squads to ensure teams can ship confidently and operate their services without unnecessary operational burden.

Role Breakdown

CI/CD and GitOps — 80%: GitHub Actions, ArgoCD, deployment safety and rollback patterns
Kubernetes and Infrastructure Operations — 80%: EKS cluster operations, Terraform, Atlantis
Reliability and Observability — 80%: SLOs, Grafana, incident response, on-call
Security and Platform Integrity — 80%: HashiCorp Vault, supply chain security, OPA
Operational Enablement — 20%: Runbook automation, cross-team reliability practices, AI-augmented delivery

Core Responsibilities

Design, build, and maintain CI/CD pipelines across all repositories using reusable GitHub Actions workflows
Own ArgoCD GitOps configuration and manage application promotion from staging to production
Implement deployment safety mechanisms, environment protections, and automated rollback patterns
Operate and upgrade the EKS cluster, including node groups, Karpenter provisioners, and cluster add-ons
Maintain Terraform infrastructure across all environments via Atlantis PR-driven workflows
Define and maintain SLOs, alerting rules, and Grafana dashboards across platform services
Lead incident response and drive structured post-incident review processes
Operate and maintain HashiCorp Vault, including auth backends, policies, and secret engines
Implement supply chain security controls: image scanning, signing, SBOM generation, and OPA policy enforcement
Partner with Security Engineering on network policy, egress controls, and compliance requirements

Operational & Enablement Responsibilities

Automate repeatable operational work and eliminate manual remediation through tooling and runbook automation
Proactively document and maintain runbooks as systems evolve
Use AI tooling to draft infrastructure code and runbook content, validating outputs against security and compliance standards before merging
Partner with product and engineering teams to strengthen reliability practices and reduce developer workflow friction
Communicate clearly and effectively during incidents in a calm, factual, and action-oriented manner

Required Experience

Proven ownership of production-grade CI/CD, GitOps, and Kubernetes operations for multi-service platforms
Experience operating and upgrading Kubernetes clusters (EKS preferred) including autoscaling with Karpenter
Strong experience managing infrastructure-as-code at scale using Terraform, including PR-driven workflows with Atlantis
Demonstrated track record in SLO definition, alert tuning, dashboard design, incident response, and post-incident reviews
Experience operating HashiCorp Vault and implementing security controls in delivery pipelines
Strong cross-functional collaboration skills enabling multiple squads to deploy safely and independently

Technical Skills

Kubernetes: Expert cluster operations, node group management, Karpenter, RBAC, PodDisruptionBudgets, topology spread constraints
GitOps: ArgoCD Application/Project management, sync policies, drift detection, automated rollback
CI/CD: GitHub Actions — reusable workflows, matrix builds, secrets handling, environment protection rules, deployment gates
Infrastructure as Code: Terraform at production scale — module design, S3 state + DynamoDB locking, Atlantis apply workflows
Service Mesh: Istio traffic management, mTLS, AuthorizationPolicy, circuit breaking, observability integration
Autoscaling: KEDA and Karpenter — event-driven autoscaling, Spot instance management, bin-packing, interruption handling
Observability: Prometheus, Grafana (dashboard-as-code), Loki, Tempo, Alertmanager
Secrets Management: HashiCorp Vault — auth backends, dynamic secrets, PKI, audit logs
Supply Chain Security: Trivy, Cosign, SBOM generation, OPA/Gatekeeper, Cilium network policy
Scripting: Strong Python and Bash for automation, tooling, and runbook automation

Generative AI & Agentic Systems

Integrates AI-powered quality gates into CI/CD pipelines, including automated code review bots, LLM-assisted security scanning, and AI-generated change risk summaries
Uses AI agents to accelerate Terraform modules, Kubernetes manifests, and Helm chart scaffolding — validating all outputs against security and compliance standards
Applies AI-assisted techniques in incident response: log correlation, runbook step suggestions, and drafting post-incident reports from structured incident data
Contributes to Prompt Execution Sandbox and Agent Gateway infrastructure requirements from a reliability and security perspective
Uses AI tooling to enhance SLO analysis, alert tuning, and capacity planning modelling

Ways of Working

Automation-First: Automates all repeatable work, prioritising reliability and eliminating manual fixes wherever possible
Proactive Documentation: Maintains current, structured runbooks and documentation before incidents occur
AI-Augmented Delivery: Leverages AI tooling to accelerate delivery while maintaining strict validation against security and compliance policies
SLO-Driven Reliability: Treats SLOs as firm commitments and raises reliability risks before they manifest as incidents
Structured Incident Communication: Communicates clearly during incidents and ensures disciplined follow-through via post-incident reviews

Interview Process

Round 1: Meet the founders
Round 2: Technical interview
Round 3: Short technical takehome exercise
Round 4: Interview with the customer

Why Adroit Cloud Consulting?

At Adroit, we are committed to fostering an environment where excellence is not just encouraged — it's expected. We offer the opportunity to work with a team of highly skilled professionals who are passionate about technology and innovation. With a flexible working environment and the support to grow your career, Adroit is the ideal place for ambitious engineers looking to make a significant impact.

Complete this form to apply

Core Services

Industry Solutions

MEET THE FOUNDERS

Resources

Featured Case study