Job VC

Middle AI Test Engineer (LLM)

SoftButler · djinni · Middle · $$$ · Гібридний формат роботи Весь світ
Open original ↗
SoftButler seeks a Middle AI Test Engineer to ensure the quality of LLM-based systems through structured testing approaches. We're an innovation-driven startup valuing ownership and technical depth. In this role, you will focus on testing AI chatbots, evaluating model outputs, and contributing to reliable AI testing practices.
RESPONSIBILITIES:
Develop and maintain a testing framework based on promptfoo for LLM testing.
Write tests for AI chatbots: functional and security-focused (Red Teaming, Security, Privacy).
Configure and maintain evaluation of model responses (LLM-as-a-judge).
Test multi-step conversations and complex dialogue flows (multi-conversation flow).
Generate synthetic datasets for various AI testing scenarios.
Write clear, well-structured code with comments as educational material for students of different levels.
REQUIREMENTS:
B2 English (for communication and technical documentation).
Strong understanding of QA principles and test design techniques.
Experience with JavaScript or Python.
Basic understanding of LLM behavior (prompting, limitations, response evaluation).
Analytical thinking and attention to detail.
Ability to write clean, readable, and well-structured code.
NICE TO HAVE:
Experience with prompt engineering or tools like promptfoo.
Basic understanding of AI safety (Red Teaming, prompt injection).
Experience with conversational AI or chatbots.
Understanding of NLP basics.
WHAT WE OFFER:
Ownership: contribute to building AI testing practices from scratch.
Mentorship: support from experienced engineers and knowledge sharing.
Career growth: development toward AI QA / AI Quality Engineer roles.
Flexible work: fully remote or hybrid (Kyiv office).
Full-time position: long-term collaboration.
Corporate events: team-building and community activities.