Emil Eirola is a senior machine learning expert with over 10 years of experience from applied machine learning projects (LinkedIn). He was one of the first hires at Silo.AI’s Helsinki office and therefore has had a vast experience of different kinds of client projects. Emil has been involved in all stages of an AI project, from identifying opportunities and researching modelling approaches to designing a solution and implementing the production deployment.
Prior to joining our journey, Emil focused on his scientific endeavors, which gave him a strong understanding of mathematics and statistics. He received his PhD in machine learning at Aalto University in 2014, and continued to work as a postdoc researcher at Arcada UAS. During his academic career, Emil has written over 30 papers with more than 400 citations (Google Scholar).
Scoping the problem to apply machine learning into real cases
Emil’s portfolio of applied machine learning includes cases from finance, transportation and manufacturing, to mention a few. Many projects have in common the goal of better understanding the current situation or foreseeing a future state of affairs. Such cases include resource planning for the transportation industry and developing a dynamic pricing model for sales teams. Emil’s most well-known project is the water treatment prediction solution he built for the global consulting engineering group Ramboll.
As an AI scientist, Emil strives to find out the client’s problem, scope it and re-scope it from the perspective of the machine learning model.
“The difference between the scientific world and the industry applications is that in academic projects you have some data with a given prediction task and a model you want to research. In the real world cases you have some data, but not necessarily at the level you would ideally need.”, Emil says.
In Emil’s experience the initial idea might be sometimes too ambitious:
“The actual system might turn out to be so complex that you don’t quite have enough data to model it properly. Then you need to narrow the scope so that the model can give you insights that are worth something. The real interest is in the data itself and what can come out of it.”
Re-scoping the problem is often about finding a part of data that is not so complex and starting by using that. Emil enjoys this beginning of the project when he can be creative and try out different solutions. He believes his skills and expertise with a wide range of models are at their best in the beginning of the project, when he gets to be explorative and make hypothesis.
Making sense of missing data
Many of Emil’s projects have included time-series data. Although digitalization has improved the data sources available, discovering patterns which happen over longer time periods also requires longer time-series. For example, having data covering one or two years is plenty for some purposes, but might not be enough to reveal seasonal dependencies.
“With any new project you need to start by looking into what data exists and what could be the advantage of that data. The challenge is that often the data is not collected with machine learning models in mind, so there might be a lot of missing values. The data needs to be preprocessed and treated in many ways before it becomes usable”, Emil points out.
However, as Emil knows, there are solutions to missing data, and inventing data to fill in the gaps is usually not the best idea. Emil wrote his PhD dissertation on Machine learning methods for incomplete data and variable selection. At Silo.AI, Emil has applied this knowledge in most of his projects and has hosted a Research Club at Silo.AI to the rest of the team to help cope with these challenges.
Strong mathematics background helps to deeply understand the data
Emil has become a machine learning expert from the ground up, by starting with the fundamental mathematics and statistics. The knowledge of many statistical models helps him to start each project by truly dwelling into the data and testing out different aspects of the data to understand it thoroughly. He experiments with simple linear models and correlations to discover which models he could apply to the data.
Emil prefers to begin with simple models:
“Simpler models are more reliable and easier to trouble-shoot. Another big reason for this is that unlike with some fancy deep learning model, you don’t need so much data to make it work. Statistically speaking you can more accurately calculate the reliability of the model, and the variability caused by having a limited data sample.”
The business needs determine whether or not the model is feasible
When applying machine learning to client business it is important to understand that the importance of model accuracy varies. Sometimes a decent enough outcome is better than what the current system has, and this improvement is enough. It’s better to start with something than to leave machine learning completely out if you don’t have good enough data.
“We need to determine the KPIs and the needed success rate with the client. This is not always an easy task. Understanding client’s business model and what the machine learning can mean for their clients is of great value.”
Getting inside the client’s world is important both for the success of the project and for Emil. He enjoys working with passionate people who truly care about their data and about improving it. In the end, the true value of the machine learning model can only be seen when applied into the client business.
Music fills the free time
When not at work Emil takes either of his two heavy metal bands to a tour in Europe, the previous trip covering eight gigs in Germany and France. Emil plays the guitar and the bass guitar. Making their own music and recording albums lets Emil’s other side of creativity thrive.
Favorite Silo.AI value?
“My favorite is Keep Learning. To me it stands for self development and continuously seeking ways to improve yourself. In my field the technology advances all the time, and therefore you need to be aware of the latest developments. I typically read new scientific articles on a weekly basis.”