Última atualização: 17 de Junho de 2025

Senior Data Platform Engineer

🌍 100% Remoto💬 Inglês✈️ Vaga internacional🧓🏽 Sênior

Via Lever

Sobre

Responsibilities:

  • Design & Optimization: Build, and fine-tune data clusters to support both batch and streaming workloads, ensuring optimal performance and reliability.
  • Platform Development: Build and expand our (Spark, Hadoop, Kubernetes, Trino, Delta Lake, and Druid) ecosystems to meet evolving business needs and add new integrations, data ingestion, and data transforms as needed.
  • Innovation: Introduce and scale new data platform solutions, iterating on our OLAP platforms and exploring next-generation data formats.
  • Collaboration: Work closely with cross-functional teams, including infrastructure engineers, to align platform capabilities with organizational goals.

Required qualifications:

  • Distributed Systems Expertise: Proven experience in scaling and tuning large deployments of Spark-on-Kubernetes and Spark-on-Hadoop.
  • Object Storage Solutions: Knowledge of open-source S3 alternatives, including Ceph and MinIO.
  • Storage Systems Knowledge: In-depth understanding of Hadoop and the HDFS protocol.
  • Performance Tuning: Skilled in designing and optimizing shuffle-heavy systems, utilizing YARN or Kubernetes with remote shuffle services.
  • Lakehouse Technologies: Hands-on experience with at least one lakehouse file format, such as Delta Lake, Apache Iceberg, or Apache Hudi.
  • OLAP Systems: Familiarity with OLAP technologies, including ClickHouse, Apache Druid, Apache Pinot, or Apache Doris.
  • Communication Skills: Strong ability to collaborate with diverse stakeholders and effectively communicate complex technical concepts.
  • Problem-Solving: Proven track record of troubleshooting and resolving issues in large-scale, production environments.

Preferred qualifications:

  • Advanced Data Formats: Experience with next-generation and multi-modal data formats, such as LanceDB.
  • Self-Service Platforms: Background in building self-service stateful platforms.
  • Accelerated Runtimes: Familiarity with native or accelerated runtimes for Spark, such as Apache DataFusion Comet, Apache Gluten, or NVIDIA RAPIDS.

Outras Informações

 Selecionamos as principais informações da posição. Para conferir o descritivo completo, clique em "acessar" 

Hey!

Cadastre-se na Remotar para ter acesso a todos os recursos da plataforma, inclusive inscrever-se em vagas exclusivas e selecionadas!