Publicidade

Última atualização: 18 de Março de 2026

Senior SRE Engineer (Temporary Contract)

This is a high‑impact, high‑expectation senior DevOps/SRE role supporting a Private Equity Group (PEG) platform. The team is new, the infrastructure is new, and the customer is extremely selective and private.

🧓🏽 Sênior💬 Inglês✈️ Vaga internacional🌍 100% Remoto

Via Smartrecruiters

Sobre

You Bring to Applaudo the Following Competencies

  • Proven ownership of production-grade CI/CD pipelines using GitHub Actions reusable workflows and GitOps automation with ArgoCD.
  • Expert-level Kubernetes and EKS operations, including node group management, Karpenter autoscaling, RBAC, PDBs, and topology constraints.
  • Production-scale Terraform expertise, including module design, S3 + DynamoDB remote state, and PR-driven workflows via Atlantis.
  • Strong reliability engineering experience, including SLO/SLI design, alerting strategies, dashboards, incident response, and post-incident reviews.
  • Hands-on experience operating HashiCorp Vault, including auth backends, PKI, dynamic secrets, and audit logging.
  • Experience implementing supply-chain security controls, including image scanning and signing, SBOM generation, and policy enforcement with OPA/Gatekeeper.
  • Strong experience with observability stacks, including Prometheus, Grafana, Loki, Tempo, and Alertmanager.
  • Experience with service mesh technologies such as Istio, including traffic management, mTLS, AuthorizationPolicies, and circuit breaking.
  • Scripting ability using Python and Bash for automation and operational tooling.
  • Active use of AI-assisted engineering tools such as Cursor, GitHub Copilot, or Cloud Code to accelerate IaC development, incident response, and runbook generation.
  • Strong communication skills, with the ability to communicate clearly and confidently with VP-level stakeholders during operational incidents.
  • Advanced English proficiency, as you will work directly with US-based clients.

You Will Be Accountable for the Following Responsibilities

  • Design and maintain GitHub Actions reusable workflows across a multi-repository ecosystem.
  • Own GitOps deployments through ArgoCD, including promotion workflows, sync policies, drift detection, and automated rollback strategies.
  • Implement deployment safety mechanisms such as environment protections, concurrency rules, and verification gates.
  • Operate and upgrade EKS clusters, including Karpenter provisioning, node groups, and critical cluster add-ons.
  • Maintain Terraform-driven infrastructure and enforce PR-driven workflows through Atlantis.
  • Define and maintain SLOs, SLIs, alerting rules, and monitoring dashboards across platform services.
  • Lead incident response, coordinate recovery efforts, and execute structured post-incident reviews.
  • Participate in an on-call rotation and contribute to improving operational processes.
  • Operate and maintain HashiCorp Vault, including policies, authentication backends, and secret engines.
  • Implement supply-chain security controls, including Trivy scanning, Cosign signing, SBOM generation, and OPA/Gatekeeper enforcement.
  • Partner with Security Engineering on network policies, egress controls, and compliance standards.
  • Automate repetitive tasks and maintain proactive runbooks to reduce operational risk.
  • Use AI tools to improve infrastructure automation, documentation, and deployment safety validation.
  • Collaborate with product teams to strengthen SLOs and deployment safety practices.
  • Challenge technical assumptions and advocate for scalable, secure DevOps architectures.

Qualifications

  • Proven end-to-end ownership of production-grade Kubernetes/EKS environments including Karpenter and Atlantis-driven Terraform workflows.
  • Demonstrated expertise with ArgoCD GitOps patterns.
  • Hands-on experience with HashiCorp Vault, supply-chain security controls, and structured incident response including on-call rotations and post-incident reviews.
  • Active use of AI-assisted tools such as Cursor, GitHub Copilot, or Cloud Code as part of daily engineering workflow.

Hey!

Cadastre-se na Remotar para ter acesso a todos os recursos da plataforma, inclusive inscrever-se em vagas exclusivas e selecionadas!