Job VC
Data Scientist — AI / Product Intelligence
Technologies
Description
We're partnered with an AI company that cleans up and enriches messy product data (PDFs, spreadsheets, images, CAD files) turning it into clean, marketplace-ready catalogs for brands and retailers selling on Shopify, Amazon, Walmart, and large PIM/ERP systems. We're helping the company bring on a few more engineers – one of the positions is this Data Scientist one.
Stack:
Python / FastAPI on the backend, React / Next.js on the frontend, LLMs throughout.
Engagement:
Full-time, fully remote.
=====
You'll own the ML that powers the enrichment, classification, and validation pipelines — the core of the product. Real data, large scale, noisy inputs, a lot of LLM work alongside classical ML.
What you'll do
Build classification, extraction, and enrichment models over product data (text, images, metadata)
Design and ship LLM pipelines – RAG, extraction, validation – in production, not just notebooks
Run experiments and build evaluation frameworks you actually trust
Work with engineering to deploy models, not hand them off
We're looking for
4-7+ years in data science / ML
Strong Python (pandas, sklearn, PyTorch, TensorFlow, etc)
Real experience with LLMs, NLP, CV
Solid instincts on model evaluation and data quality
You've worked on messy real-world data and know how to handle it
You're comfortable working with a significant overlap with the US Pacific Time
Your conversational English is strong (you will be speaking with your teammates daily)
Nice to have
Product / catalog / e-commerce data experience
Experience with vector DBs, embeddings, RAG in production
Stack:
Python / FastAPI on the backend, React / Next.js on the frontend, LLMs throughout.
Engagement:
Full-time, fully remote.
=====
You'll own the ML that powers the enrichment, classification, and validation pipelines — the core of the product. Real data, large scale, noisy inputs, a lot of LLM work alongside classical ML.
What you'll do
Build classification, extraction, and enrichment models over product data (text, images, metadata)
Design and ship LLM pipelines – RAG, extraction, validation – in production, not just notebooks
Run experiments and build evaluation frameworks you actually trust
Work with engineering to deploy models, not hand them off
We're looking for
4-7+ years in data science / ML
Strong Python (pandas, sklearn, PyTorch, TensorFlow, etc)
Real experience with LLMs, NLP, CV
Solid instincts on model evaluation and data quality
You've worked on messy real-world data and know how to handle it
You're comfortable working with a significant overlap with the US Pacific Time
Your conversational English is strong (you will be speaking with your teammates daily)
Nice to have
Product / catalog / e-commerce data experience
Experience with vector DBs, embeddings, RAG in production