Job VC

Machine Learning Ops & Infrastructure Engineer [USA project]

Newsoft · dou · Not specified · Львів

Technologies

AI AWS Apache Airflow Bash Docker ETL GCP Git Kubernetes PyTorch Python Telemetry Terraform

Description

We are looking for an experienced ML Ops & Infrastructure Engineer to join an AI-oriented project focused on building technology solutions for automating complex processes in heavy industry domains — from construction and mining to the energy sector.
The team is developing reliable AI and robotics-oriented systems that help people perform difficult and potentially dangerous work more efficiently and safely. The project combines software engineering, AI infrastructure, and real-world hardware-oriented challenges, where reliability, scalability, and the ability to work with high-load ML workflows are essential.
In this role, you will work closely with Research and Engineering teams to build infrastructure for data processing, model training, evaluation, and deployment of ML models.
Requirements:

— 3+ years of commercial experience in ML infrastructure, MLOps, Data Engineering, or related backend / infrastructure roles.
— Strong experience with AWS and/or GCP.
— Hands-on experience with Docker and Kubernetes (K8s).
— Strong Python skills and solid software engineering fundamentals.
— Experience building and maintaining scalable data pipelines and workflow orchestration systems.
— Experience with orchestration / workflow tools such as Airflow, Argo, Kubeflow, or similar.
— Understanding of distributed compute environments and ML training workflows.
— Experience with Git and bash scripting.
— Strong problem-solving skills and the ability to work independently in a dynamic environment.
— Upper-Intermediate or higher English proficiency.
Nice to have:

— Experience with Terraform or other Infrastructure as Code tools.
— Familiarity with distributed training frameworks (PyTorch DDP, Ray, Horovod, etc.).
— Experience with model monitoring, observability, or data drift detection.
— Experience working with large volumes of unstructured data (video, sensor data, spatial data).
— Background in AI-heavy domains such as computer vision, autonomous systems, robotics, or similar fields.
Responsibilities:
— Design, build, and maintain scalable ML infrastructure and internal platforms.
— Develop reliable data ingestion, processing, and versioning pipelines.
— Build and optimize environments for distributed model training and evaluation.
— Manage cloud compute workloads across AWS and/or GCP.
— Containerize and orchestrate ML workloads using Docker and Kubernetes.
— Collaborate closely with ML researchers and software engineers to improve development workflows.
— Improve infrastructure reliability, scalability, and deployment processes.
— Build tooling and systems that accelerate AI research and production readiness.
Interview stages:

— HR interview (30 minutes).
— Technical interview (1 hour).
— Interview with the customer (1 hour).
— Job offer 🎉🎉🎉
We offer:

— Competitive salary.
— Challenging tasks and projects.
— Opportunities for professional development and growth.
— Flexible working hours.
— New hardware.
— Free English lessons.
— Table tennis and yoga classes.
— Availability of generators and Starlinks in the office.
— Small gym in the office.
— Adjustable height tables.
— Regular office fruit delivery and other benefits.
Let’s create value together!