Job VC

Senior Data Engineer

8allocate · djinni · $$$$ · Тільки віддалено Країни ЄС

Description

We are looking for a Senior Data Engineer to join growing engineering team. This person will be responsible for designing, building, and operating the data pipelines and warehousing infrastructure that power our products and partner integrations. The role centers on Apache Airflow orchestration, ETL/ELT in Python, and PostgreSQL data modeling at non-trivial scale.
This is a senior position. We expect end-to-end ownership - dive into a problem, build the context, propose a path, ship it, and operate it - pulling in others when their expertise is genuinely required. This person will work directly with engineering leadership and integrate tightly with our backend service teams.
The ideal candidate brings hands-on experience designing and operating production Airflow deployments, strong Python and SQL fluency, and a working comfort with messy vendor data. They actively leverage AI-powered tooling - Claude Code, Cursor, MCP, and similar - as a force multiplier in day-to-day delivery, and they have an informed point of view on where this is going. They are collaborative, detail-oriented, and thrive in a fast-paced, agile environment.
Position
:
New position. Team growth.
Work schedule:
9-10 a.m.- 6-7 p.m., flexible working hours.
Location:

Remote.
English
:
Upper-Intermediate.
Technical stack:
Orchestration:
Apache Airflow (multiple production DAGs)
Languages:
Python 3.12+, SQL (PostgreSQL); Java reading literacy is a plus
Data libraries:
Pandas, NumPy, SQLAlchemy
Databases:
PostgreSQL with managed migrations (Flyway-style)
Cloud:
Azure and / or AWS
AI / Agentic:
Claude Code, Anthropic API, MCP (Model Context Protocol), Ollama, internal agent platform
Adjacent stack you'll interact with:
Java, Spring Boot, Hibernate, Query DSL, Debezium (CDC), WebSocket / SSE
Requirements:
Real pipeline ownership.
You've designed, shipped, and operated production Airflow (or comparable orchestrator) at non-trivial scale. You know what breaks at 2 AM and how to keep it from waking the rest of the team.
Strong Python data manipulation.
Pandas, NumPy, and SQLAlchemy are working tools, not interview trivia.
Real SQL fluency.
You think in joins, indexes, and execution plans. Stored procedures and migrations don't intimidate you.
Vendor-data realism.
You're comfortable reading vendor documentation, integrating messy external APIs, and translating "how is your data shaped?" into a workable schema in front of a partner.
AI-first instinct.
You already use Claude Code, Cursor, or comparable tooling daily and you push it harder than your peers do. You have a forming point of view on agent design, prompts, and MCP.
Self-direction.
Senior here means you find the work, build the context, decide the approach, and ship - escalating only when escalation is genuinely required.
Distributed-systems sensibility.
You understand event-driven patterns, idempotency, replayability, and the realities of real-time data processing.
Nice to have:
Java reading literacy. Some of our DAGs invoke Java-based containers; cursory comfort speeds up triage. Backend engineers and AI agents can fill the rest.
Hands-on with CDC (Debezium) and event-driven architectures
Experience with cloud data warehouses (Snowflake, Synapse, BigQuery) or lakehouse architectures
Financial / market / reference data background; familiarity with crypto pricing, securities reference, or fund accounting data
An MCP server, Claude skill, custom agent, or hook you've built or extended
dbt, Kafka, Event Hubs, or comparable data tooling
Responsibilities:
Pipeline Engineering & Orchestration:
Design, build, and operate Apache Airflow DAGs that ingest data from financial vendors, partners, and internal systems
Architect ETL/ELT pipelines in Python with appropriate batching, idempotency, retry, backfill, and lineage characteristics
Integrate with messy external vendor APIs and file feeds; handle schema drift, late data, and partial failures gracefully
Build and maintain background workers and scheduled services for process automation
Establish observability for pipelines: logging, metrics, alerting, and runbooks for on-call response
Data Modeling & SQL
Design and optimize PostgreSQL schemas across operational and analytical domains
Author and tune complex SQL: stored procedures, views, window functions, indexing strategies, and query plans
Implement managed database migrations and versioning using existing tooling
Model reference, market, and transactional data into clean, queryable shapes that downstream services and analytics can rely on
AI-Augmented Engineering
Pair with AI agents on data operations: direct them, review their work, and push back when they're wrong
Identify practical opportunities to apply AI within both the development toolset and product features
Contribute to internal agent skills, prompts, and MCP integrations that improve pipeline diagnostics, schema management, vendor onboarding, and code review
Stay current on emerging AI patterns and bring back what works
Cross-Functional Delivery
Sit in on partner and client calls, map their data into our model, and ship integrations without long handoffs
Contribute to architecture decisions across the data platform - we don't gatekeep that to a small group
Collaborate with backend engineers on Java / Spring Boot services that consume or produce data your pipelines manage
Document approaches and decisions clearly enough that future-you and your colleagues can build on them
Benefits from 8allocate:
Team & Culture: Team events, offsites, and a culture that keeps people connected.
Learning & Development: Budget for courses, certifications, and conferences.
Wellbeing: Flexible support in line with company policy, with options to support your physical and mental wellbeing (sport, mental health, or medical insurance).
Rest & Recovery: Paid vacation and sick leave.