I worked as a Solution Architect in building an MLOps platform that is offered as a service and aims to be the backbone for most of the ML operations running in production at one of the biggest financial institutions in Sweden. The problems that this particular platform was solving were very specific:
- The organization wanted to establish processes and tools to release machine learning models with confidence.
- The organization wanted to accelerate the time to market when it comes to ML products. So we wanted to identify common components in the MLOPs lifecycle and offer them as a service.
- The organization wanted to be technologically agnostic when it came to ML/AI. We didn’t want to introduce a solution or bring in vendors that would opt out of particular ML frameworks or that it would bind us to particular software releases, library versions, or technology stacks.
Releasing with confidence
Releasing machine learning models with confidence and in a trusted way is a rather hard task and should be tightly integrated with the organizational processes and practices. The MLOps solution we built enabled a standardized way of governance and logging of machine learning operations.
Our solution answered basic ML problems regarding
- Auditability, reproducibility, and compliance
- Packaging and serving models to different technology stacks
- Monitoring and acting on ML operations and predictions
- Using different ML frameworks and programming languages
Besides the introduction of the technical components, our solution also contributed to establishing the right processes for releasing ML models. In this case, we emphasized identifying the responsible people for maintaining our ML production pipelines and for acting on the model predictions.
Building the MLOps platform
We started small by identifying relatively easier but high-impact ML use cases in the organization and by setting the support for those use cases as our end target. At first, we decided to work with our traditional software development tools and didn’t bring in tools such as MLflow or Kubeflow. We wanted to know how far we could go with the traditional software development stack (Tip: not that far).
We spent time defining the business problem that ML use cases were trying to solve and analyzed the way that we were going to operationalize the ML model and monitor it. It was crucial for us to know how the business owners were going to interact with the model, and how we could collect feedback regarding the business value the model was creating. We needed to design the relevant data pipelines and deploy suitable monitoring tools. Throughout the project, we continuously focused on defining which data we were going to use, who would be responsible for the ‘health’ of that data, and which SLAs should govern the deliveries of that data to our productized ML model. DataOps activities were an important part of our ML productization process
In general, even if each ML use case comes with distinctive requirements, all the machine learning use cases should adhere to certain prerequisites around the area of ML model and data governance. The European Union has described ‘7 key requirements that AI systems should meet to be deemed trustworthy’. These requirements worked as a guideline in our ML projects and in building the MLOps processes:
Choosing the technologies to address our requirements
From the beginning we wanted our MLOps platform to be as much technology agnostic as possible. We wanted our data scientists to be able to use the ML framework that fits their purpose the best and we wanted to be able to train, deploy, and operate the models in the environment that was best suited to the requirements of the use case. To exemplify, in some use cases we wanted to train and deploy models to GCP (Google Cloud Platform), for others we thought that our big data platform would be a better match, and for the rest, we ran the models behind APIs in our on-prem cloud-like environment. We also wanted to support both Python and R.
One requirement we had for our MLOps platform was that it should adapt to the existing technologies and tools that the organization and data engineers had already set up (such as Airflow and Jupyter Notebooks). In general, I think that the MLOps technology stack always needs to be customized according to the existing tools and technologies and in order to add the proper ML governance.
There are a lot of MLOps tools available both in the market and as open-source. Every organization needs to pick the right ML tools that suit their needs and enhance their current capabilities with AI. For example, if your organization is already using Kubernetes, then it makes sense to start with the Kubeflow tool suite. Alternatively, if your organization has already a lot of infrastructure in place in terms of data orchestration and data processing technologies, then obtaining a full ML platform with orchestration tools, data storages, and processing solutions might be a little bit overwhelming.
What every organization building MLOps processes should keep in mind
Securing trusted ML deliveries and fulfilling all the needed requirements is about setting up the right processes with identified steps, roles, and responsibilities. The MLOps tools should have a supportive role in these processes. For example, before a model reaches production, it should have passed a risk exercise – proper monitoring tools should be in place so that the relevant stakeholders can understand how the model makes decisions. Also, adding fairness checks should be offered on the MLOps platform.
Finally, I could conclude that identifying a good performant model and placing it in production is just the start of the MLOps journey; continuous monitoring and evaluation of models is the key.
If you are interested in learning more about setting up MLOps processes and tooling, you might be interested in our Reliable and scalable AI with MLOps eBook.
Interested in discussing MLOps with our experts?
Get in touch with Pertti Hannelin, our VP of Business Development at email@example.com or via LinkedIn.