Última atualização: 27 de Outubro de 2025

Site Reliability Engineer - Data Platform

🌍 100% Remoto✈️ Vaga internacional💬 Inglês🧓🏽 Sênior

Via Ashbyhq

Sobre

The team

Join our Data Infrastructure team and play a pivotal role in upholding the reliability, scalability, and efficiency of our robust Data platform. As a Senior Site Reliability Engineer (SRE) specialized in Data Infrastructure, you will collaborate closely with diverse cross-functional teams to conceive, execute, and oversee the foundational data infrastructure that empowers our array of applications and services.

The opportunity

  • Implement data infrastructure solutions (self service) that support the needs of 10+ business units and over 100 engineering and data analysts
  • Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform
  • Develop and maintain automation scripts using bash/shell scripting and to automate operational tasks and deployments
  • Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure
  • Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues
  • Manage and implement role-based access control (RBAC) and permissions for a multitude of user groups and machine workflows across different environments
  • Manage and maintain real-time streaming data architecture using technologies like Kafka and Debezium Change Data Capture (CDC)
  • Ensure the timely and accurate processing of streaming data, enabling data analysts and engineers to gain insights from up-to-date information
  • Utilize Kubernetes to manage containerized applications within the data infrastructure, ensuring efficient deployment, scaling, and orchestration
  • Implement effective incident response procedures and participate in on-call rotations
  • Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement appropriate solutions
  • Document architecture, processes, and best practices to enable knowledge sharing and support continuous improvement
  • Support AI/ML teams with their infra requests

Skills you should HODL

  • Proven experience (5+ years) working as a Site Reliability Engineer, Infrastructure Engineer, Data Infrastructure Engineer, or similar roles, with a focus on data infrastructure and security
  • Experience with maintaining real-time data processing technologies, such as Kafka and Flink clusters and Debezium instances
  • Working experience in managing hybrid multi-tenant cloud systems particularly on AWS
  • Infrastructure as Code tools such as Terraform, Terragrunt and Atlantis
  • Experience with containerization and orchestration tools, particularly Kubernetes, Nomad, and Docker
  • Solid understanding of bash/shell scripting and proficiency in at least one programming language (preferably Python or JVM languages)
  • Experience maintaining data-related technologies: Apache Airflow, Apache Spark, DBs, BI tooling
  • Experience solving data access management issues at large scale data-lake
  • Familiarity with CI/CD deployment pipelines and related tools
  • Strong problem-solving skills and the ability to troubleshoot complex systems
  • Experience with data-related technologies (databases, data lakes, airflow, spark) is a plus

Outras Informações

Selecionamos as principais informações da posição. Para conferir o descritivo completo, clique em "acessar" 


Hey!

Cadastre-se na Remotar para ter acesso a todos os recursos da plataforma, inclusive inscrever-se em vagas exclusivas e selecionadas!