I’ve worked in several MLOps projects, where we’ve built end-to-end pipelines that go from data preparation all the way to deploying the ML models. In this article, I will share my learnings in MLOps from a project where the end result was an object detection application. The goal was to build scalable MLOps that would enable the ML model to learn over time, and make the retraining of the model automatic.
You will learn about
- Steps in ML lifecycle and possible technical stack for each step
- Implementing scalable MLOps – what can be automated and how
- Benefits of scalable MLOps
In the project, we focused on building an ML pipeline that is triggered every time that there is enough new training data available and automatically deploys a new version of the model. The pipeline needed to take care of the entire model lifecycle, covering:
- Data preparation and standardization which usually takes lots of time from AI scientists
- Training and testing the model
- Packaging and containerizing of the model and libraries dependencies
- Model validation, which means trying to reach a good level of accuracy that meets business goals. This may require tuning hyperparameters of the model such as learning rate, number of epochs and number of network layers, and repeating the training process several times.
- Deployment and monitoring of the model, meaning that the AI scientists and engineers bring the model to production and monitor its performance over time.
- Retraining the model brings us back to point 2 and it will happen as new training data comes in.
MLOps enabling the end-to-end repeatable ML pipeline
Our challenge was to build an end-to-end repeatable ML pipeline that would serve the computer vision-powered end product. We solved this by building a solid MLOps solution.
The overall goal of MLOps is to make the process of productizing ML models smoother. MLOps applies the DevOps techniques, concepts, and practices to machine learning systems with an increased demand to take care of aspects around data activities, such as data versioning, data lineage, and data quality. MLOps responds to an increased need for model observability and monitoring model performance. If you want to read more about the human side of ML development, read this article.
Having robust MLOps in place gave us AI engineers an opportunity to try different deployment options, without changing other components of the pipeline. An MLOps architecture and way of working provided us independently executable steps, that saved a lot of time and effort in the building phase.
Below we have described a typical, MLOps architecture that illustrates all the steps that the MLOps infrastructure may need to cover, depending on the requirements of ML R&D and utilization.
Implementing scalable MLOps
MLOps can be automated with proper control steps in order to have good governance over training, evaluating, and monitoring of the model. By introducing those steps we can bring in the right people who should be involved in the lifecycle of an ML product.
Preparing of data
There are also possibilities to automate or semi-automate preparing of the data. At Silo AI, we’ve built ML pipelines with our own productized solution to support automated annotations, or partly annotated data, with a human in the loop validating the trickiest annotations.
Thanks to MLOps you can keep track of the changes in data, model, or hyperparameters and also compare the models for their accuracy. It is possible to look at each dataset as an artifact, enabling us to version our datasets and keep track of which version of data was used to train and evaluate a specific version of the model. This also affects the scalability of the product.
Retraining and deployment of the retrained model
One challenge we had in this project was that the ML model had to be converted in order to be consistent with the deployment environment (=the final application). As the deployment environment had some specific requirements regarding the machine learning models, we needed to do a conversion between the trained model and the model that could be used by the application. At the end, we chose to create scheduled trainings once we got enough new data. We created a reusable pipeline and kept track of the changes with versioning. The conversion became a part of the mobile application developer’s work, that would fetch the trained model and deploy it on the application side.
Depending on the project and its needs, you may have several possibilities for the retraining phase. In my view, a good option is to either have a scheduled retraining, or retrain the model when its predictive power starts to degrade or the world around us changes. The changing business needs will also affect these decisions in the real-world scenario.
In each retraining, we create a new version of the ML model, which needs to be versioned and validated every time. These steps can be automated, or AI Scientists can validate the model.
Different tools in MLOps workflow
The customer needs and their existing technology stack dictates the choices for the MLOps workflow. Different customers have different needs as some prefer to have a framework or platform-agnostic solutions, and others need vendor-dependent cloud-based solutions such as Azure, AWS, and Google Cloud. Recently, cloud-based solutions have made many things easier for MLOps. To give an example, in my experience some steps in the MLOps workflow such as packaging and taking care of libraries dependencies are easier in certain cloud environments.
Here’s a list of some tools that I’ve found useful
- For data analysis, model development and evaluation there are modern data analysis tools as well as machine learning/deep learning libraries such as Pandas, Scikit-learn, Tensorflow, Pytorch and sparkML.
- For workflow orchestration and tasks such as experiment tracking, model tracking, artifacts logging and registration, there are tools like MLflow and Kubeflow, Azure Machine Learning SDK or AWS Sagemaker.
- For lifecycle automation, there are tools like Jenkins, Travis and Azure Devops.
The above-mentioned tools are just a small subset, what I have used. At Silo AI, we work with several other tools and technologies too, and these depend on the client and project in question.
Start small, after you’ve done your first PoCs
The trained model will eventually be judged by its ability to meet a business need. Each company using ML therefore needs a standard and repeatable ML development process similar to applications development process (DevOps) with reusable assets.
The best way is to build a robust MLOps solution, which will
- enforce collaboration and best practices between different teams from AI scientists to product owners through shared and
- reduce the complexity of the collaboration by managing the assets such as transparent documentation, changes in datasets, source codes, libraries and models.
- make it possible to have independent executable steps which avoid repeating time-consuming tasks.
- Significantly reduce the development time.
To get started, it’s good to create the overall big picture, and then building that vision incrementally with small steps. A skeleton of a simple pipeline is a good enough starting point. After you’ve seen what works and what you need more, move towards more sophisticated pre-processing steps, training, evaluation etc.
A good time to start building MLOps is right after you’ve proven your ML models create value. With machine learning, you start with data exploration, and with MLOps, you start with building pipelines for the existing ML models.
The difference between code and a model is that you may depend on that piece of code for many years, but ML models will degrade over time. You will have to monitor them constantly and replace older models in production with new ones. Having robust MLOps will decrease the delivery time and help you succeed in machine learning
Interested in discussing MLOps with our experts? Get in touch with Pertti Hannelin, our VP of Business Development at email@example.com or via LinkedIn.