The machine learning operations (MLOps) maturity model defines principles and practices to help you build and operate production machine learning environments. Use this model to assess your current state and plan incremental progress toward a mature MLOps environment.
Maturity model overview
The MLOps maturity model clarifies the development operations (DevOps) principles and practices required to run a successful MLOps environment. It provides a framework to measure your organization's MLOps capabilities and identify gaps in your current implementation. Use this model to develop your MLOps capability gradually instead of facing the full complexity of mature implementation upfront.
Use the MLOps maturity model as a guide to do the following tasks:
Estimate the scope of the work for new engagements.
Establish realistic success criteria.
Identify deliverables to hand over at the end of the engagement.
Like most maturity models, the MLOps maturity model qualitatively assesses people and culture, processes and structures, and objects and technology. As the maturity level increases, the likelihood that incidents or errors lead to improvements in development and production processes also increases.
The MLOps maturity model encompasses five levels of technical capability.
| Level |
Description |
Highlights |
Technology |
| 0 |
No MLOps |
- Full machine learning model life cycle is difficult to manage.
- Teams are disparate and releases are challenging.
- Most systems are nontransparent, with little feedback during and after deployment.
|
- Builds and deployments are manual.
- Model and application testing is manual.
- Model performance tracking isn't centralized.
- Model training is manual.
- Teams use only basic Azure Machine Learning workspace features.
|
| 1 |
DevOps but no MLOps |
- Releases are less challenging than Level 0, but rely on data teams for every new model.
- Feedback about model performance in production is still limited.
- Results are difficult to trace and reproduce.
|
- Builds are automated.
- Application code has automated tests.
- Code is version controlled.
|
| 2 |
Automated training |
- Training environment is fully managed and traceable.
- Model is easy to reproduce.
- Releases are manual but easy to implement.
|
- Model training is automated.
- Model training performance tracking is centralized.
- Model management is in place.
- Machine Learning scheduled or event-driven jobs handle recurring training.
- Managed feature store is adopted.
- Azure Event Grid life cycle events are emitted for pipeline orchestration.
- Environments are managed by using Machine Learning environment definitions.
|
| 3 |
Automated model deployment |
- Releases are easy to implement and automatic.
- Full traceability exists from deployment back to original data.
- Entire environment is managed, including training, testing, and production.
|
- A/B testing of model performance is integrated for deployment.
- All code has automated tests.
- Model training performance tracking is centralized.
- Artifacts are promoted across workspaces by using Machine Learning registries.
|
| 4 |
Full MLOps automated operations |
- Full system is automated and easily monitored.
- Production systems provide information about how to improve, and sometimes automatically improve with new models.
- System is approaching zero downtime.
|
- Model training and testing are automated.
- Deployed model emits verbose, centralized metrics.
- Drift or regression signals trigger automatic retraining by using Event Grid.
- Feature materialization health and freshness are monitored.
- Model promotion is policy-based and automated by using Machine Learning registries.
|
The following tables describe detailed characteristics for each level of maturity.
Level 0: No MLOps
| People |
Model creation |
Model release |
Application integration |
- Data scientists work in isolation without regular communication with the larger team.
- Data engineers (if they exist) work in isolation without regular communication with the larger team.
- Software engineers work in isolation and receive models remotely from other team members.
|
- Data is gathered manually.
- Compute is likely not managed.
- Experiments aren't tracked consistently.
- End result is typically a single model file that includes inputs and outputs, handed off manually.
|
- Release process is manual.
- Scoring script is created manually after experiments and isn't version controlled.
- A single data scientist or data engineer handles release.
|
- Implementation depends heavily on data scientist expertise.
- Application releases are manual.
|
Level 1: DevOps but no MLOps
| People |
Model creation |
Model release |
Application integration |
- Data scientists work in isolation without regular communication with the larger team.
- Data engineers (if they exist) work in isolation without regular communication with the larger team.
- Software engineers work in isolation and receive models remotely from other team members.
|
- Data pipeline automatically gathers data.
- Compute might or might not be managed.
- Experiments aren't tracked consistently.
- End result is typically a single model file that includes inputs and outputs, handed off manually.
|
- Release process is manual.
- Scoring script is created manually after experiments but is likely version controlled.
- Model is handed off to software engineers.
|
- Basic integration tests exist for the model.
- Implementation depends heavily on data scientist expertise.
- Application releases are automated.
- Application code has unit tests.
|
Level 2: Automated training
| People |
Model creation |
Model release |
Application integration |
- Data scientists work directly with data engineers to convert experimentation code into repeatable scripts and jobs.
- Data engineers work with data scientists on model development.
- Software engineers work in isolation and receive models remotely from other team members.
|
- Data pipeline automatically gathers data.
- Compute is managed.
- Experiment results are tracked.
- Training code and models are both version controlled.
|
- Release process is manual.
- Scoring script is version controlled and has tests.
- Software engineering team manages releases.
|
- Basic integration tests exist for the model.
- Implementation depends heavily on data scientist expertise.
- Application code has unit tests.
|
Level 3: Automated model deployment
| People |
Model creation |
Model release |
Application integration |
- Data scientists work directly with data engineers to convert experimentation code into repeatable scripts and jobs.
- Data engineers work with data scientists and software engineers to manage inputs and outputs.
- Software engineers work with data engineers to automate model integration into application code.
|
- Data pipeline automatically gathers data.
- Compute is managed.
- Experiment results are tracked.
- Training code and models are both version controlled.
|
- Release process is automatic.
- Scoring script is version controlled and has tests.
- Continuous integration and continuous delivery (CI/CD) pipeline manages releases.
|
- Each model release includes unit and integration tests.
- Implementation is less dependent on data scientist expertise.
- Application code has unit and integration tests.
|
Level 4: Full MLOps automated operations
| People |
Model creation |
Model release |
Application integration |
- Data scientists work directly with data engineers to convert experimentation code into repeatable scripts and jobs. They also work with software engineers to identify data markers.
- Data engineers work with data scientists and software engineers to manage inputs and outputs.
- Software engineers work with data engineers to automate model integration and implement post-deployment metrics gathering.
|
- Data pipeline automatically gathers data.
- Production metrics automatically trigger retraining.
- Compute is managed.
- Experiment results are tracked.
- Training code and models are both version controlled.
|
- Release process is automatic.
- Scoring script is version controlled and has tests.
- CI/CD pipeline manages releases.
|
- Each model release includes unit and integration tests.
- Implementation is less dependent on data scientist expertise.
- Application code has unit and integration tests.
|
MLOps and GenAIOps
This article focuses on predictive, tabular, and classical machine learning life cycle capabilities. Generative AI operations (GenAIOps) introduce extra capabilities that complement the MLOps maturity levels rather than replace them. GenAIOps include prompt life cycle, retrieval augmentation, output safety, and token cost governance. For more information, see GenAIOps for organizations that have MLOps investments. Don't confuse prompt iteration mechanics with the reproducible training-deployment loop described in this article.
Contributors
Microsoft maintains this article. The following contributors wrote this article.
To see nonpublic LinkedIn profiles, sign in to LinkedIn.
Next steps