Última atualização: 8 de Julho de 2025
Senior Data Engineer
Via Greenhouse
Sobre
As a Senior Data Engineer for our Data Platform, you will shape the future of how Wikimedia’s vast data ecosystem serves both our internal teams and the global community. You will help drive the Data Platform Engineering team’s effort to unify data systems across the Wikimedia Foundation to deliver scalable solutions that support internal and external platform users and the open knowledge movement.
Experience
- 5+ years of data engineering experience, with a significant portion focused on on-premise systems (e.g., Hadoop, HDFS).
- Practical knowledge of engineering best practices with a strong emphasis on system robustness and maintainability.
- Hands-on experience in troubleshooting systems and pipelines for performance and scaling.
- Demonstrated consistency with tenure at companies (e.g., average of 2+ years, ideally including longer engagements).
- Desirable: Exposure to architectural/system design or technical leadership tasks.
- Desirable: Experience in data governance, data lineage, and data quality initiatives.
Skills
Core Technical Skills
- Expertise in tools like Airflow, Kafka, Spark, and Hive.
- Advanced proficiency in Python and Java/Scala, with deep knowledge of one language and its ecosystem.
- Advanced working knowledge of SQL and experience with various database/query dialects (e.g., MariaDB, HiveQL, CassandraQL, Spark SQL, Presto).
Bonus Skills
- Familiarity with additional technologies such as Flink, Iceberg, Druid, Presto, Cassandra, Kubernetes, and Docker.
- Expertise in AI development tooling and AI applications in data engineering and analytics.
Other Skills
- Familiarity with stream processing frameworks like Spark Streaming or Flink.
- Strong communication and collaboration skills to interact effectively within and across teams.
- Ability to produce clear, well-documented technical designs and articulate ideas to both technical and non-technical stakeholders.
Responsibilities
- Designing and Building Data Pipelines: Develop scalable, robust infrastructure and processes using tools such as Airflow, Spark, and Kafka.
- Monitoring and Alerting for Data Quality: Implement systems to detect and address potential data issues promptly.
- Supporting Data Governance and Lineage: Assist in designing and implementing solutions to track and manage data across pipelines.
- Data Platform Development: Contribute to the design and improvement of the shared data platform, enabling critical use cases such as product analytics, bot detection, and image classification.
- Enhancing Operational Excellence: Identify and implement improvements in system reliability, maintainability, and performance.
Outras Informações
Selecionamos as principais informações da posição. Para conferir o descritivo completo, clique em "acessar"
Hey!
Cadastre-se na Remotar para ter acesso a todos os recursos da plataforma, inclusive inscrever-se em vagas exclusivas e selecionadas!