1001 Remote Jobs -- MLOps Engineer (LLM Infrastructure)

Свежие вакансии удаленной работы

Сегодня: 07-Feb-2026 13:03 GMT

Подробный просмотр вакансии (вы будете перенаправлены на сторонний веб-сайт)
Название вакансии:	MLOps Engineer (LLM Infrastructure)
Кто разместил:	Внешняя вакансия с jobs.dou.ua
Опубликована:	15-Aug-2025 15:13 GMT
Компания:	Kyivstar.Tech
Описание:	We are hiring an MLOps Engineer specializing in Large Language Model (LLM) infrastructure to design and maintain the robust platform on which our AI models are developed, deployed, and monitored. As an MLOps Engineer, you will build the backbone of our machine learning operations — from scalable training pipelines to reliable deployment systems — ensuring that our NLP models (including LLMs) can be trained on large datasets and served to end-users efficiently. This role sits at the intersection of software engineering, DevOps, and machine learning, and is crucial for accelerating our R&D in the Ukrainian LLM project. You’ll work closely with data scientists and software engineers to implement best-in-class infrastructure and workflows for the continuous delivery of AI innovations. About us Kyivstar.Tech is a Ukrainian hybrid IT company and a resident of Diia.City.We are a subsidiary of Kyivstar, one of Ukraine’s largest telecom operators. Our mission is to change lives in Ukraine and around the world by creating technological solutions and products that unleash the potential of businesses and meet users’ needs. Over 500+ KS.Tech specialists work daily in various areas: mobile and web solutions, as well as design, development, support, and technical maintenance of high-performance systems and services. We believe in innovations that truly bring quality changes and constantly challenge conventional approaches and solutions. Each of us is an adherent of entrepreneurial culture, which allows us never to stop, to evolve, and to create something new. What you will do * Design and implement modern, scalable ML infrastructure (cloud-native or on-premises) to support both experimentation and production deployment of NLP/LLM models. This includes setting up systems for distributed model training (leveraging GPUs or TPUs across multiple nodes) and high-throughput model serving (APIs, microservices). * Develop end-to-end pipelines for model training, validation, and deployment. Automate the ML workflow from data ingestion and feature processing to model training and evaluation, using technologies like Docker and CI/CD pipelines to ensure reproducibility and reliability. * Collaborate with Data Scientists and ML Engineers to design MLOps solutions that meet model performance and latency requirements. Architect deployment patterns (batch, real-time, streaming inference) are appropriate for various use-cases (e.g., a real-time chatbot vs. offline analysis). * Implement and uphold best practices in MLOps, including automated testing of ML code, continuous integration/continuous deployment for model updates, and rigorous version control for code, data, and model artifacts. Ensure every model and dataset is properly versioned and reproducible. * Set up monitoring and alerting for deployed models and data pipelines. Use tools to track model performance (latency, throughput) and accuracy drift in production. Implement logging and observability frameworks to quickly detect anomalies or degradations in model outputs. * Manage and optimize our Kubernetes-based deployment environments. Containerize ML services and use orchestration (Kubernetes, Docker Swarm or similar) to scale model serving infrastructure. Handle cluster provisioning, health, and upgrades, possibly using Helm charts for managing LLM services. * Maintain infrastructure-as-code (e.g., Terraform, Ansible) for provisioning cloud resources and ML infrastructure, enabling reproducible and auditable changes to the environment. Ensure our infrastructure is scalable, cost-effective, and secure. * Perform code reviews and guide other engineers (both MLOps and ML developers) on building efficient and maintainable pipelines. Troubleshoot issues across the ML lifecycle, from data processing bottlenecks to model deployment failures, and continuously improve system robustness. Qualifications and experience needed Experience & Background: * 4+ years of experience in DevOps, MLOps, or ML Infrastructure roles * Strong foundation in software engineering and DevOps principles as they apply to machine learning * A bachelor’s or Master’s in Computer Science, Engineering, or a related field is preferred Cloud & Infrastructure: * Extensive experience with cloud platforms (AWS, GCP, or Azure) and designing cloud-native applications for ML * Comfortable using cloud services for compute (EC2, GCP Compute, Azure VMs), storage (S3, Cloud Storage), container registry, and serverless components where appropriate * Experience managing infrastructure with Infrastructure-as-Code tools like Terraform or CloudFormation Containerization & Orchestration: * Proficiency in container technologies (Docker) and orchestration with Kubernetes * Ability to deploy, scale, and manage complex applications on Kubernetes clusters; experience with tools like Helm for Kubernetes package management * Knowledge of container security and networking basics in distributed systems CI/CD & Automation: * Strong experience implementing CI/CD pipelines for ML projects * Familiar with tools like Jenkins, GitLab CI, or GitHub Actions for automating testing and deployment of ML code and models * Experience with specialized ML CI/CD (e.g., TensorFlow Extended TFX, MLflow for model deployment) and GitOps workflows (Argo CD) is a plus Programming & Scripting: * Strong coding skills in Python, with experience in writing pipelines or automation scripts related to ML tasks * Familiarity with shell scripting and one or more general-purpose languages (Go, Java, or C++) for infrastructure tooling * Ability to debug and optimize code for performance (both in data pipelines and in model inference code) ML Pipeline Knowledge: * Solid understanding of the machine learning lifecycle and tools * Experience building or maintaining ML pipelines, possibly using frameworks like Kubeflow, Airflow, or custom solutions * Knowledge of model serving frameworks (TensorFlow Serving, TorchServe, NVIDIA Triton, or custom Flask/FastAPI servers for ML) Monitoring & Reliability: * Experience setting up monitoring for applications and models (using Prometheus, Grafana, CloudWatch, or similar) and implementing alerting for anomalies * Understanding of model performance metrics and how to track them in production (e.g., accuracy on a validation stream, response latency) * Familiarity with concepts of A/B testing or canary deployments for model updates in production Security & Compliance: * Basic understanding of security best practices in ML deployments, including data encryption, access control, and dealing with sensitive data in compliance with regulations * Experience implementing authentication/authorization for model endpoints and ensuring infrastructure complies with organizational security policies Team Collaboration: * Excellent collaboration skills to work with cross-functional teams * Experience interacting with data scientists to translate model requirements into scalable infrastructure * Strong documentation habits for outlining system designs, runbooks for operations, and lessons learned A plus would be LLM/AI Domain Experience: * Previous experience deploying or fine-tuning large language models or other large-scale deep learning models in production * Knowledge of specialized optimizations for LLMs (such as model parallelism, quantization techniques like 8-bit or 4-bit quantization, and use of libraries like DeepSpeed or Hugging Face Accelerate for efficient training) will be highly regarded Distributed Computing: * Experience with distributed computing frameworks such as Ray for scaling up model training across multiple nodes * Familiarity with big data processing (Spark, Hadoop) and streaming data (Kafka, Flink) to support feeding data into ML systems in real time Data Engineering Tools: * Some experience with data pipeline and ETL * Knowledge of tools like Apache Airflow, Kafka, or dbt, and how they integrate into ML pipelines * Understanding of data warehousing concepts (Snowflake, BigQuery) and how processed data is used for model training Versioning & Experiment Tracking: * Experience with ML experiment tracking and model registry tools (e.g., MLflow, Weights & Biases, DVC) * Ensuring that every model version and experiment is logged and reproducible for auditing and improvement cycles Vector Databases & Retrieval: * Familiarity with vector databases (Pinecone, Weaviate, FAISS) and retrieval systems used in conjunction with LLMs for augmented generation is a plus High-Performance Computing: * Exposure to HPC environments or on-prem GPU clusters for training large models * Understanding of how to maximize GPU utilization, manage job scheduling (with tools like Slurm or Kubernetes operators for ML), and profile model performance to remove bottlenecks Continuous Learning: * Up-to-date with the latest developments in MLOps and LLMOps (Large Model Ops) * Active interest in new tools or frameworks in the MLOps ecosystem (e.g., model optimization libraries, new orchestration tools) and a drive to evaluate and introduce them to improve our processes What we offer * Office or remote — it’s up to you. You can work from anywhere, and we will arrange your workplace * Remote onboarding * Performance bonuses * We train employees with the opportunity to learn through the company’s library, internal resources, and programs from partners * Health and life insurance * Wellbeing program and corporate psychologist * Reimbursement of expenses for Kyivstar mobile communication Відгукнутись на вакансію
Job ID:	138131
Требуемые навыки:	Bigdata, Cloud, Cpp, Devops, Java, Python
Зарплата:
Регион:	Київ, віддалено

Подробный просмотр вакансии (вы будете перенаправлены на сторонний веб-сайт)

Вакансия	Компания	Открыта
QA / iGaming Tester (Real Money Games) Навыки: Android, Backend, Frontend Регион: Київ, віддалено	PAGA GAMES	07-Feb-2026 10:14 GMT
Помощник/ассистент в отдел разработки одежды fashion Регион: Москва	Абрашина Оксана Юрьевна	07-Feb-2026 10:12 GMT
Пециалист по созданию уюта и гостеприимству (Администратор-декоратор) Регион: Браслав	Терещенко Иван Игоревич	07-Feb-2026 10:11 GMT
Реферальный менеджер Зарплата: от 30 000 до 60 000 руб. Регион: Пермь	Иванов Кирилл Николаевич	07-Feb-2026 10:10 GMT
Менеджер по обработке заявок Зарплата: до 50 000 руб. Регион: Саранск	Стабильность	07-Feb-2026 10:09 GMT
Backend-разработчик Навыки: Backend Зарплата: от 100 000 до 120 000 руб. Регион: Екатеринбург	Продуктовая платформа (ИП Мелешкевич Евгений Михайлович)	07-Feb-2026 10:08 GMT
Senior Performance-маркетолог (Meta) Зарплата: от 150 000 руб. Регион: Минск	IT-Agency	07-Feb-2026 10:05 GMT
Бренд-шеф / Шеф-повар / Концепт-шеф / Европейская кухня Зарплата: от 300 000 до 300 000 руб. Регион: Москва	Фри Мэн	07-Feb-2026 10:03 GMT
Бренд-шеф / Шеф-повар / Концепт-шеф / Европейская кухня Зарплата: от 300 000 до 300 000 руб. Регион: Санкт-Петербург	Фри Мэн	07-Feb-2026 10:03 GMT
Менеджер маркетплейса Ozon (ОЗОН) Зарплата: от 80 000 руб. Регион: Казань	Романовские радости	07-Feb-2026 10:01 GMT
Сотрудник по заполнению товарных ведомостей Регион: Алматы	Масло-Дел	07-Feb-2026 10:01 GMT
Менеджер по работе с маркетплейсами Зарплата: от 50 000 до 150 000 руб. Регион: Москва	MaruZE	07-Feb-2026 09:57 GMT
Монтажёр Зарплата: от 40 000 руб. Регион: Великий Новгород	Скороход Никита Алексеевич	07-Feb-2026 09:54 GMT
Менеджер отдела продаж Зарплата: от 100 000 руб. Регион: Краснодар	Верзилина Анна	07-Feb-2026 09:54 GMT
Руководитель проектов 1С (Документооборот) Зарплата: от 300 000 руб. Регион: Москва	HRLab (ИП Чиняева Анастасия Игоревна)	07-Feb-2026 09:53 GMT

Все вакансии

Удаленные вакансии по навыкам ...
Удаленные вакансии 'android' Удаленные вакансии 'angular' Удаленные вакансии 'ajax' Удаленные вакансии 'aspnet' Удаленные вакансии 'backend' Удаленные вакансии 'bigdata' Удаленные вакансии 'cloud' Удаленные вакансии 'cms' Удаленные вакансии 'cpp' Удаленные вакансии 'csharp' Удаленные вакансии 'css'	Удаленные вакансии 'devops' Удаленные вакансии 'drupal' Удаленные вакансии 'excel' Удаленные вакансии 'frontend' Удаленные вакансии 'fullstack' Удаленные вакансии 'html' Удаленные вакансии 'java' Удаленные вакансии 'javascript' Удаленные вакансии 'joomla' Удаленные вакансии 'iphone' Удаленные вакансии 'linux'	Удаленные вакансии 'mysql' Удаленные вакансии 'php' Удаленные вакансии 'python' Удаленные вакансии 'qa' Удаленные вакансии 'ruby' Удаленные вакансии 'seo' Удаленные вакансии 'sql' Удаленные вакансии 'sysadm' Удаленные вакансии 'vbnet' Удаленные вакансии 'xml' Удаленные вакансии 'wordpress'
Читать RSS-ленты ... Новое!
Лента вакансий для 'android' Лента вакансий для 'angular' Лента вакансий для 'ajax' Лента вакансий для 'aspnet' Лента вакансий для 'backend' Лента вакансий для 'bigdata' Лента вакансий для 'cloud' Лента вакансий для 'cms' Лента вакансий для 'cpp' Лента вакансий для 'csharp' Лента вакансий для 'css'	Лента вакансий для 'devops' Лента вакансий для 'drupal' Лента вакансий для 'excel' Лента вакансий для 'frontend' Лента вакансий для 'fullstack' Лента вакансий для 'html' Лента вакансий для 'java' Лента вакансий для 'javascript' Лента вакансий для 'joomla' Лента вакансий для 'iphone' Лента вакансий для 'linux'	Лента вакансий для 'mysql' Лента вакансий для 'php' Лента вакансий для 'python' Лента вакансий для 'qa' Лента вакансий для 'ruby' Лента вакансий для 'seo' Лента вакансий для 'sql' Лента вакансий для 'sysadm' Лента вакансий для 'vbnet' Лента вакансий для 'xml' Лента вакансий для 'wordpress'

Новое!
Jobs in English	Длинный URL: www.1001remotejobs.ru	Мобильная версия: m.1001rejo.ru