Jobs/Machine Learning Ops Engineer_6+ years

Machine Learning Ops Engineer_6+ years

Zorba AI

IN
Not disclosed
Jun 19, 2026(June 19, 2026)

Job description

We are seeking a highly skilled Senior Data & ML Operations Engineer to manage, monitor, and optimize end-to-end data and machine learning pipelines. The ideal candidate will have strong expertise in Python, Databricks, MLOps, CI/CD automation, production support, data validation, monitoring frameworks, and GenAI-enabled solutions. This role requires close collaboration with data engineering, machine learning, DevOps, and business teams to ensure reliable and scalable data and AI platforms. Key ResponsibilitiesData Pipeline Monitoring & Production Support

  • Monitor end-to-end data pipeline execution and ensure successful daily operations.
  • Identify, troubleshoot, and resolve pipeline failures, performance bottlenecks, and production issues.
  • Execute reruns and recovery procedures to minimize downtime and maintain SLA compliance.
  • Collaborate with cross-functional teams to resolve dependencies, blockers, and integration issues.
  • Implement preventive health checks, monitoring frameworks, and robust logging mechanisms. Data Quality & Validation
  • Design and maintain dashboards for data validation, reconciliation, and quality monitoring.
  • Perform data quality assessments and ensure integrity, consistency, and accuracy of pipeline outputs.
  • Develop automated validation frameworks and quality checks across data workflows.
  • Build alerts and notification systems for pipeline failures, data anomalies, and operational issues. Machine Learning Operations (MLOps)
  • Monitor model performance using statistical and business metrics.
  • Detect and analyze data drift, feature drift, and concept drift across production models.
  • Support deployment, monitoring, maintenance, and lifecycle management of ML models.
  • Implement model explainability techniques and performance reporting frameworks. Automation & Agent-Based Solutions
  • Develop intelligent agent-based solutions for automated monitoring, troubleshooting, and debugging.
  • Leverage Generative AI technologies for operational insights, issue summarization, and root cause analysis.
  • Automate repetitive operational tasks to improve platform reliability and efficiency. CI/CD & Platform Engineering
  • Design, enhance, and maintain CI/CD pipelines for data and ML workloads.
  • Implement secure authentication mechanisms, including data-based authentication workflows.
  • Build and optimize deployment pipelines, release processes, and infrastructure automation.
  • Support DevOps best practices for version control, testing, deployment, and monitoring. Stakeholder Management
  • Communicate project status, risks, incidents, and resolutions effectively to stakeholders.
  • Ensure timely delivery of operational and project commitments.
  • Participate in incident management, root cause analysis, and continuous improvement initiatives. Required SkillsTechnical Skills
  • Python
  • Databricks
  • SQL
  • Data Engineering & Data Processing
  • Machine Learning Engineering
  • MLOps
  • CI/CD Pipeline Development
  • Monitoring & Production Support
  • Data Validation & Data Quality Management
  • Logging & Observability Tools
  • Dashboard Development & Reporting
  • Statistical Analysis & Model Monitoring
  • Model Explainability Techniques
  • Generative AI Applications
  • Automation & Agent-Based Systems Preferred Skills
  • Azure DevOps, GitHub Actions, Jenkins, or similar CI/CD tools
  • MLflow
  • Apache Spark / PySpark
  • Cloud Platforms (Azure, AWS, or GCP)
  • Monitoring tools such as Datadog, Grafana, Prometheus, or equivalent
  • Experience with LLMs and GenAI frameworks Qualifications
  • Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field.
  • 5+ years of experience in Data Engineering, MLOps, Production Support, or ML Platform Engineering.
  • Proven experience managing production-scale data and machine learning systems.
  • Strong analytical, troubleshooting, and communication skills. Skills: machine learning,apache spark,pipeline
Account

Welcome!

Sign in to access your account

Features

Browse JobsResume BuilderResume TemplatesCover LetterATS Score CheckerAI RewriteResume ExamplesSalary Guide

General

About UsBlogFAQContact UsPrivacy PolicyTerms of ServiceCookie Policy