Job VC
Senior DevOps/SRE Engineer
Technologies
Description
We are looking for a hands-on Senior DevOps/SRE Engineer to own the infrastructure and platform reliability that our real-time trading systems depend on. You will be the person who makes sure that production never surprises us — building and maintaining the pipelines, environments, and observability tooling that allow the engineering team to ship with confidence. You are comfortable managing and operating Cloud and Kubernetes infrastructure, opinionated about CI/CD, and understand that in a live trading environment, uptime and latency are not optional. You bring a disciplined, systematic approach to infrastructure-as-code, treat on-call as ownership — not rotation — and leave things better than you found them.
Domain: FinTech / Proprietary Trading (US market).
Product: A cutting-edge platform connecting traders with funding opportunities.
Tech Vibe: High-load, low-latency, and data-intensive environment processing millions of events in real time.
Requirements:
Deep, hands-on AWS experience
— VPC design, EKS, IAM, RDS, S3, CloudWatch, and related services in production environments.
Strong Kubernetes expertise
— cluster management, Helm, resource optimization, RBAC, and troubleshooting in real production workloads.
Proven experience
building and maintaining CI/CD pipelines with GitHub Actions
— including multi-environment promotion flows, secrets management, and deployment gating.
Operational experience with Grafana logging stack or equivalent centralized logging platforms — index management, retention policies, and log-based alerting.
Hands-on experience with Kafka in production
— topic management, consumer group monitoring, offset management, and performance tuning.
Familiarity with ClickHouse
or comparable columnar/analytical databases in a high-throughput data environment.
Experience with Grafana
for dashboards, alerting, and on-call / IRM workflows — including configuring alert rules, escalation policies, and runbooks.
Infrastructure-as-code fluency — Terraform or equivalent
; version-controlled, peer-reviewed infrastructure changes are non-negotiable.
Strong understanding of networking fundamentals:
DNS, load balancing, TLS termination, service mesh basics, and ingress patterns.
Experience supporting or operating real-time, latency-sensitive systems
— you understand that a five-second lag in a trading environment is not acceptable.
Reliable, thorough, and ownership-oriented — you close issues, document what you learn, and don’t leave alerts untuned or runbooks unwritten.
Upper Intermediate English level and strong communication skills.
Nice to have:
Background in fintech, prop trading, or other regulated, high-stakes environments where uptime and data correctness are non-negotiable.
Experience with service mesh tooling (Cilium, Istio, Linkerd) for inter-service observability and traffic management.
Familiarity with NATS, Redis, or other low-latency messaging or caching layers common in trading infrastructure.
Scripting proficiency in Python, Go, or Bash for automation, tooling, and operational scripting.
Experience with multi-region or multi-AZ deployment strategies and automated failover patterns.
Comfort contributing to or reviewing application code — not a full-stack engineer, but able to engage meaningfully with developers on deployment, environment, and observability concerns.
Exposure to security tooling: vulnerability scanning, secrets management (Vault, AWS Secrets Manager), and audit logging.
Experience operating in a multi-team engineering organization with shared infrastructure dependencies.
Responsibilities:
Design, maintain, and evolve cloud infrastructure on AWS — including EKS clusters, networking, IAM, and supporting services — with a strong bias toward reliability, security, and cost efficiency.
Own and operate Kubernetes workloads end-to-end: deployment configurations, resource management, autoscaling, and cluster health across dev, staging, and production environments.
Build and maintain CI/CD pipelines using GitHub Actions — from code merge to production deploy — ensuring fast, repeatable, and auditable release flows for all services.
Manage and evolve the central monitoring stack (Grafana Loki based) and observability tooling to ensure engineers can investigate issues quickly and operations teams can act on signals before they become incidents.
Own Grafana IRM and the on-call system: configure alerting, dashboards, and escalation policies; drive incident response culture and post-incident review processes.
Manage and scale Kafka-based event streaming infrastructure supporting real-time trading
What we offer:
Annual paid vacation of 18 working days.
Extra vacation days for long-lasting cooperation.
Annual paid sick leave of 10 days.
Maternity/Paternity leave.
The opportunity for sabbatical leave.
Marriage and Parenthood Package.
Compensation for sports activities or health insurance covering (up to 250$ per year) — after the trial period.
Career development plan.
English and Spanish classes.
Paying taxes and managing PE (Private Entrepreneur).
Technical equipment.
Internal Referral program.
Opportunity to take part in company volunteering activities.
Sombra is a “Friendly to Veterans” award-holder.
If you believe you are a suitable candidate for this position, please attach your updated resume using the provided link.
Our recruitment team will review your profile, and if it aligns with our current job openings, we will contact you shortly. If you don’t receive a reply from us within 5 business days, it means we have decided to move forward with other candidates.
Thank you for understanding!
Domain: FinTech / Proprietary Trading (US market).
Product: A cutting-edge platform connecting traders with funding opportunities.
Tech Vibe: High-load, low-latency, and data-intensive environment processing millions of events in real time.
Requirements:
Deep, hands-on AWS experience
— VPC design, EKS, IAM, RDS, S3, CloudWatch, and related services in production environments.
Strong Kubernetes expertise
— cluster management, Helm, resource optimization, RBAC, and troubleshooting in real production workloads.
Proven experience
building and maintaining CI/CD pipelines with GitHub Actions
— including multi-environment promotion flows, secrets management, and deployment gating.
Operational experience with Grafana logging stack or equivalent centralized logging platforms — index management, retention policies, and log-based alerting.
Hands-on experience with Kafka in production
— topic management, consumer group monitoring, offset management, and performance tuning.
Familiarity with ClickHouse
or comparable columnar/analytical databases in a high-throughput data environment.
Experience with Grafana
for dashboards, alerting, and on-call / IRM workflows — including configuring alert rules, escalation policies, and runbooks.
Infrastructure-as-code fluency — Terraform or equivalent
; version-controlled, peer-reviewed infrastructure changes are non-negotiable.
Strong understanding of networking fundamentals:
DNS, load balancing, TLS termination, service mesh basics, and ingress patterns.
Experience supporting or operating real-time, latency-sensitive systems
— you understand that a five-second lag in a trading environment is not acceptable.
Reliable, thorough, and ownership-oriented — you close issues, document what you learn, and don’t leave alerts untuned or runbooks unwritten.
Upper Intermediate English level and strong communication skills.
Nice to have:
Background in fintech, prop trading, or other regulated, high-stakes environments where uptime and data correctness are non-negotiable.
Experience with service mesh tooling (Cilium, Istio, Linkerd) for inter-service observability and traffic management.
Familiarity with NATS, Redis, or other low-latency messaging or caching layers common in trading infrastructure.
Scripting proficiency in Python, Go, or Bash for automation, tooling, and operational scripting.
Experience with multi-region or multi-AZ deployment strategies and automated failover patterns.
Comfort contributing to or reviewing application code — not a full-stack engineer, but able to engage meaningfully with developers on deployment, environment, and observability concerns.
Exposure to security tooling: vulnerability scanning, secrets management (Vault, AWS Secrets Manager), and audit logging.
Experience operating in a multi-team engineering organization with shared infrastructure dependencies.
Responsibilities:
Design, maintain, and evolve cloud infrastructure on AWS — including EKS clusters, networking, IAM, and supporting services — with a strong bias toward reliability, security, and cost efficiency.
Own and operate Kubernetes workloads end-to-end: deployment configurations, resource management, autoscaling, and cluster health across dev, staging, and production environments.
Build and maintain CI/CD pipelines using GitHub Actions — from code merge to production deploy — ensuring fast, repeatable, and auditable release flows for all services.
Manage and evolve the central monitoring stack (Grafana Loki based) and observability tooling to ensure engineers can investigate issues quickly and operations teams can act on signals before they become incidents.
Own Grafana IRM and the on-call system: configure alerting, dashboards, and escalation policies; drive incident response culture and post-incident review processes.
Manage and scale Kafka-based event streaming infrastructure supporting real-time trading
What we offer:
Annual paid vacation of 18 working days.
Extra vacation days for long-lasting cooperation.
Annual paid sick leave of 10 days.
Maternity/Paternity leave.
The opportunity for sabbatical leave.
Marriage and Parenthood Package.
Compensation for sports activities or health insurance covering (up to 250$ per year) — after the trial period.
Career development plan.
English and Spanish classes.
Paying taxes and managing PE (Private Entrepreneur).
Technical equipment.
Internal Referral program.
Opportunity to take part in company volunteering activities.
Sombra is a “Friendly to Veterans” award-holder.
If you believe you are a suitable candidate for this position, please attach your updated resume using the provided link.
Our recruitment team will review your profile, and if it aligns with our current job openings, we will contact you shortly. If you don’t receive a reply from us within 5 business days, it means we have decided to move forward with other candidates.
Thank you for understanding!