Última atualização: 16 de Outubro de 2025

Senior Quality Engineer (AI)

💬 Inglês✈️ Vaga internacional🌍 100% Remoto🧓🏽 Sênior

Via Ashbyhq

Sobre

Our client is committed to building innovative and scalable solutions that drive efficiency and impact. We foster a culture of continuous learning, collaboration, and proactive problem-solving. If you're looking for an environment where you can grow and make a difference, we want to hear from you!

Job Summary

As we advance our AI development efforts, we recognize the need for more than a traditional QA engineer. We are seeking a GenAI Quality Coach—a strategic and hands-on role that blends test innovation, prompt effectiveness analysis, and user feedback insights.

This individual will help shape and evolve our QA practices specifically for GenAI

systems. They will partner closely with developers, product owners, and SMEs to ensure we are building robust, safe, and high-quality AI features

Responsibilities

1. Define GenAI-Specific QA Strategy

- Develop a QA framework tailored to GenAI systems and workflows.

- Design tests for:

  • Prompt behavior across varied inputs and user tasks
  • Hallucination detection
  • Factual consistency and groundedness

- Blend manual and automated test design for both deterministic and stochastic

outputs.

- Collaborate with teams to obtain or create sample data with clear target

outputs.

2. Test Plan Ownership

- Own end-to-end test strategy and execution for GenAI-powered features.

- Ensure coverage across:

  • Diverse prompt phrasing, user intents, and failure modes
  • Multiple GenAI features (e.g., summarization, generation, classification)
  • High-risk, edge-case, and compliance-driven scenarios.

3. Prompt Validation & Evaluation

- Lead the design and implementation of prompt and model evaluation protocols:

  • Alignment between user input and intended behavior
  • Output fluency, tone, and coherence
  • Clarity, coverage, and relevance of responses

- Use Golden datasets and benchmark prompts to establish evaluation

baselines.

4. Human-in-the-Loop (HITL) Evaluation

- Design and manage SME-driven review workflows.

- Facilitate structured reviews focused on:

  • Correctness/accuracy based on metrics and SME feedback
  • Capturing edge-case failures

5. Reporting and KPIs

- Define and track QA effectiveness using metrics such as:

  • Pass rate for high-risk use-cases
  • HITL reviewer agreement rates and flagging critical issues
  • Use-case specific measures of “quality”

- Deliver clear, actionable dashboards and reports to leadership on AI quality,

safety, and readiness

Qualifications and Job Requirements

You might be a great fit if you:

  • Are excited by the complexities and challenges of GenAI testing.
  • Think like a product owner, act like a tester, and communicate like a coach.
  • Thrive in ambiguity and enjoy shaping new standards.
  • Are passionate about safe, responsible AI development.

Outras Informações

Selecionamos as principais informações da posição. Para conferir o descritivo completo, clique em "acessar" 


Hey!

Cadastre-se na Remotar para ter acesso a todos os recursos da plataforma, inclusive inscrever-se em vagas exclusivas e selecionadas!