Última atualização: 17 de Junho de 2025

Senior Data Platform Engineer

Via Lever

Sobre

Responsibilities:

Design & Optimization: Build, and fine-tune data clusters to support both batch and streaming workloads, ensuring optimal performance and reliability.
Platform Development: Build and expand our (Spark, Hadoop, Kubernetes, Trino, Delta Lake, and Druid) ecosystems to meet evolving business needs and add new integrations, data ingestion, and data transforms as needed.
Innovation: Introduce and scale new data platform solutions, iterating on our OLAP platforms and exploring next-generation data formats.
Collaboration: Work closely with cross-functional teams, including infrastructure engineers, to align platform capabilities with organizational goals.

Required qualifications:

Distributed Systems Expertise: Proven experience in scaling and tuning large deployments of Spark-on-Kubernetes and Spark-on-Hadoop.
Object Storage Solutions: Knowledge of open-source S3 alternatives, including Ceph and MinIO.
Storage Systems Knowledge: In-depth understanding of Hadoop and the HDFS protocol.
Performance Tuning: Skilled in designing and optimizing shuffle-heavy systems, utilizing YARN or Kubernetes with remote shuffle services.
Lakehouse Technologies: Hands-on experience with at least one lakehouse file format, such as Delta Lake, Apache Iceberg, or Apache Hudi.
OLAP Systems: Familiarity with OLAP technologies, including ClickHouse, Apache Druid, Apache Pinot, or Apache Doris.
Communication Skills: Strong ability to collaborate with diverse stakeholders and effectively communicate complex technical concepts.
Problem-Solving: Proven track record of troubleshooting and resolving issues in large-scale, production environments.

Preferred qualifications:

Advanced Data Formats: Experience with next-generation and multi-modal data formats, such as LanceDB.
Self-Service Platforms: Background in building self-service stateful platforms.
Accelerated Runtimes: Familiarity with native or accelerated runtimes for Spark, such as Apache DataFusion Comet, Apache Gluten, or NVIDIA RAPIDS.

Outras Informações

Selecionamos as principais informações da posição. Para conferir o descritivo completo, clique em "acessar"

Hey!

Cadastre-se na Remotar para ter acesso a todos os recursos da plataforma, inclusive inscrever-se em vagas exclusivas e selecionadas!