Machine Learning Ops Engineer_6+ years
Zorba AI
Job description
We are seeking a highly skilled Senior Data & ML Operations Engineer to manage, monitor, and optimize end-to-end data and machine learning pipelines. The ideal candidate will have strong expertise in Python, Databricks, MLOps, CI/CD automation, production support, data validation, monitoring frameworks, and GenAI-enabled solutions. This role requires close collaboration with data engineering, machine learning, DevOps, and business teams to ensure reliable and scalable data and AI platforms. Key ResponsibilitiesData Pipeline Monitoring & Production Support
- Monitor end-to-end data pipeline execution and ensure successful daily operations.
- Identify, troubleshoot, and resolve pipeline failures, performance bottlenecks, and production issues.
- Execute reruns and recovery procedures to minimize downtime and maintain SLA compliance.
- Collaborate with cross-functional teams to resolve dependencies, blockers, and integration issues.
- Implement preventive health checks, monitoring frameworks, and robust logging mechanisms. Data Quality & Validation
- Design and maintain dashboards for data validation, reconciliation, and quality monitoring.
- Perform data quality assessments and ensure integrity, consistency, and accuracy of pipeline outputs.
- Develop automated validation frameworks and quality checks across data workflows.
- Build alerts and notification systems for pipeline failures, data anomalies, and operational issues. Machine Learning Operations (MLOps)
- Monitor model performance using statistical and business metrics.
- Detect and analyze data drift, feature drift, and concept drift across production models.
- Support deployment, monitoring, maintenance, and lifecycle management of ML models.
- Implement model explainability techniques and performance reporting frameworks. Automation & Agent-Based Solutions
- Develop intelligent agent-based solutions for automated monitoring, troubleshooting, and debugging.
- Leverage Generative AI technologies for operational insights, issue summarization, and root cause analysis.
- Automate repetitive operational tasks to improve platform reliability and efficiency. CI/CD & Platform Engineering
- Design, enhance, and maintain CI/CD pipelines for data and ML workloads.
- Implement secure authentication mechanisms, including data-based authentication workflows.
- Build and optimize deployment pipelines, release processes, and infrastructure automation.
- Support DevOps best practices for version control, testing, deployment, and monitoring. Stakeholder Management
- Communicate project status, risks, incidents, and resolutions effectively to stakeholders.
- Ensure timely delivery of operational and project commitments.
- Participate in incident management, root cause analysis, and continuous improvement initiatives. Required SkillsTechnical Skills
- Python
- Databricks
- SQL
- Data Engineering & Data Processing
- Machine Learning Engineering
- MLOps
- CI/CD Pipeline Development
- Monitoring & Production Support
- Data Validation & Data Quality Management
- Logging & Observability Tools
- Dashboard Development & Reporting
- Statistical Analysis & Model Monitoring
- Model Explainability Techniques
- Generative AI Applications
- Automation & Agent-Based Systems Preferred Skills
- Azure DevOps, GitHub Actions, Jenkins, or similar CI/CD tools
- MLflow
- Apache Spark / PySpark
- Cloud Platforms (Azure, AWS, or GCP)
- Monitoring tools such as Datadog, Grafana, Prometheus, or equivalent
- Experience with LLMs and GenAI frameworks Qualifications
- Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field.
- 5+ years of experience in Data Engineering, MLOps, Production Support, or ML Platform Engineering.
- Proven experience managing production-scale data and machine learning systems.
- Strong analytical, troubleshooting, and communication skills. Skills: machine learning,apache spark,pipeline
Resume not ready?
Build an ATS-friendly resume tailored to this role in minutes — for free.
Build resume→Source: LinkedIn