Job VC

ML Engineer — Structured Data Extraction

Labelwise · djinni · $$$ · Тільки віддалено Весь світ
Open original ↗
Remote · Contract · EU time zone preferred · Capacity: propose what the scope needs.
We pay for outcomes, not hours. You own the pipeline; we own the product.
Contract engagement, scope-driven. Propose your own structure — trial period, fixed-price milestones, retainer, or a hybrid. Whatever fits how you actually work.

We're building the definitive structured quality, safety & value dataset for the global supplement market. Two customers: consumers (React Native app) and AI systems (structured API).
The data pipeline IS the product.

What you deliver:
A pipeline that generalizes across our module types: boolean classification, multi-class classification, characteristic extraction, entity extraction, entity linking/grouping, and structured attribute profiling.
Core deliverables:
Golden-record schema and workflow that works for any of the above, co-built with our regulatory SME
Eval harness in CI — accuracy reported on every commit, per module
Versioned prompt/rule system with rollback
Measured, reproducible accuracy numbers on held-out sets, per module, against written gates
A workflow our SME can operate without a developer in the loop
Success = numbers we trust, on sets we trust, with a pipeline that won't silently regress — and a framework that scales to every remaining module without rework.

Stack:
Python, DSPy, vLLM, PaddleOCR, RT-DETR, open-weight VLMs and LLMs. Laravel/Filament backend (you consume it, our backend team maintains it).

Required:
3+ years shipping production ML/data pipelines — not research
Prompt optimization at scale: DSPy / BootstrapFewShot / MIPROv2, or equivalent (LangChain + pytest eval harness counts)
VLM inference serving (vLLM / TGI / llama.cpp)
OCR pipelines wired to detection models
Eval-first: you ship the harness before the pipeline
Clean Python, git discipline, versioned prompts
You've maintained a pipeline in production for 12+ months (not just built and handed off

Not a fit if you:
Propose RAG for structured extraction
Want to fine-tune before prompt-optimizing
Want to pre-train a domain model
Need a PM to translate goals into tickets

Nice to have:
Comfortable reading enough PHP/Laravel to integrate with our Filament admin.

We provide:

Direct CEO collaboration — fast decisions.
Regulatory SME owns golden records and sign-off
Backend team handles schema and infra
Written acceptance criteria before each module

Compensation
: fixed-price per milestone, or retainer. Propose your structure with application.

How to apply?
Fill out questioner below and submit.
NB.
No cover letters. No AI-generated applications — we'll notice.