Job VC

Senior Data Engineer (US-Based Product ,Real-Time Data Platform)

Hanna Robulets Company · djinni · Senior · $$$$ · Тільки віддалено Країни Європи та Україна

Technologies

AI AWS Active Directory Amazon S3 BigQuery CI/CD Docker ETL Git GitHub GitLab Python SQL Telemetry dbt

Description

About the Product
We are building a US-based, data-driven product with a strong focus on scalability, performance, and cost efficiency.
Our mission is to design a modern data platform that transforms raw behavioral and monetization data into reliable, actionable business insights — in near real-time.

For us, data engineering is not just about moving data.
It’s about:
Designing resilient architecture
Optimizing for performance and cost
Building reliable automation
Ensuring architectural integrity at scale

Role Overview
We are looking for a
Senior Data Engineer
who will take ownership of the data platform architecture and drive technical excellence across ingestion, modeling, and performance optimization.
This role requires deep expertise in SQL, Python, AWS infrastructure, and modern data stack principles. You will not only build pipelines — you will define standards, lead architectural decisions, and proactively improve system efficiency.
You will play a critical role in ensuring that data flows seamlessly from event streams to business-ready datasets while maintaining high performance, reliability, and cost control.
What Makes This Role Senior-Level

As a Senior Data Engineer, you will:
Own architectural decisions for the data platform
Identify scalability bottlenecks before they become incidents
Optimize data infrastructure for performance and cost
Lead technical code reviews and set engineering standards
Mentor mid-level engineers
Act as a technical partner to Product and Analytics stakeholders
Balance real-time and batch processing strategies strategically
Technical Requirements
Must-Have
Expert-Level SQL
Complex analytical queries and window functions
Query optimization and execution plan analysis
Identifying and eliminating performance bottlenecks
Reducing query complexity and compute costs
Designing partitioning and clustering strategies
Python
Advanced data manipulation
Building scalable ETL/ELT frameworks
Writing production-grade data services
Automation and monitoring scripts
AWS Core Infrastructure
AWS Kinesis Firehose (near-real-time data streaming)
Amazon S3 (data lake architecture and storage optimization)
Designing reliable ingestion layers
Version Control
Git (GitHub / GitLab)
Branching strategies
Leading technical code reviews
Enforcing best practices in code quality

Nice-to-Have
Modern Data Stack
dbt (modular SQL modeling, documentation, testing)
Experience structuring layered data models (staging → intermediate → marts)
Data Warehousing
Google BigQuery
Slot management
Cost-efficient querying
Storage and compute optimization
Advanced Optimization Techniques
Partitioning
Clustering
Bucketing
Storage layout optimization
Integrations & Infrastructure
Salesforce data integration
Docker / ECS
CI/CD for data workflows
AI / ML Exposure
Supporting feature pipelines
Understanding data requirements for ML systems

Key Responsibilities
Data Platform Architecture
Design and maintain a scalable real-time and batch data platform
Architect ingestion pipelines using AWS Kinesis and Python
Ensure high availability and reliability of data flows
Real-Time Processing
Enable near-real-time (seconds–minutes latency) data processing
Build systems for operational alerting and anomaly detection
Ensure early detection of monetization and traffic issues
Data Modeling
Transform raw event data into business-ready datasets using dbt
Design scalable, maintainable schemas aligned with product evolution
Performance & Cost Engineering
Optimize SQL queries and storage structures
Design cost-efficient partitioning strategies
Monitor and reduce warehouse and infrastructure costs
Balance real-time and batch processing appropriately
Engineering Excellence
Lead and participate in code reviews
Enforce high standards of performance, security, and maintainability
Improve observability and monitoring across pipelines
Cross-Functional Collaboration
Work closely with Data Analysts and Product Managers
Translate business requirements into scalable technical solutions
Clearly communicate trade-offs between speed, cost, and complexity
Type of Data We Process
User behavior events (page views, clicks, searches, conversions)
Ad & monetization events (impressions, clicks, CTR, attribution)
System and integration logs (latency, errors, rate limits)
Why Real-Time Is Critical
Detect broken ads or impression drops before revenue is lost
Identify traffic anomalies or abuse early
Enable same-day operational intervention
Prevent negative user and advertiser experience
Near-real-time (seconds to minutes latency) is required for operational awareness.
Batch processing remains important for historical analysis and reporting — but not for incident detection.

Working Schedule
Monday – Friday
16:00 – 00:00 Kyiv time
Full alignment with US-based stakeholders

What We Value
Strong ownership mindset
Strategic thinking about architecture
Focus on scalability, reliability, and cost efficiency
Proactive problem-solving
Clear communication with both technical and non-technical teams
Ability to think beyond “just making it work”