Similarly to AI, edge AI should be approached as an umbrella term. Edge AI is not a model but an entire technology stack. Like AI, there’s an incredible amount of different technologies and methods underneath edge AI. Niko Vuokko, CTO at Silo AI, was recently invited to AMLD, one of the largest machine learning and AI events in Europe, to give a keynote about edge AI, its different applications, and case studies. Below, we have gathered insights from Niko’s talk, explaining what edge AI is, what it consists of and what is needed to make edge AI work in a real-life setup.
Why and when to deploy AI on edge
Edge AI is an awkward term. If we just consider all the different technologies and contexts involved in edge and edge AI, then edge AI appears like a tropical rainforest full of detail. This is in stark contrast to the large-scale but relatively standardized world of the cloud.
AI, just as software in general, benefits strongly from economies of scale. In other words, the larger the scale in users, data, or sensors, the more AI use cases can be successfully deployed and the more ambitious algorithms used. This means that if there is a way to centralize the solution and put the AI running in the cloud, then that’s how it’s going to be done.
So the key question we at Silo AI ask when considering the use of edge AI is, “Why can’t we just run this in the cloud?”. The answers to this question generally fit one of the following four key reasons.
First of all, it can be forbidden, either by law or by various contracts. Whether for trade secrets, security classification, or privacy reasons, limits are often in place on how data is stored and processed. One example of this in Silo AI’s work includes the case of visually verifying that personal protective equipment (PPE) is correctly worn. In this case, in addition to the common data protection regulations, workforce employment contracts also set limits on acceptable data processing to protect the rights of individuals.
Second, it can be just too expensive to transmit the data. For example,we have built AI solutions to run within an environmental sensor on an 8-bit microcontroller. The quality of the sensor gradually degrades due to ongoing chemical reactions, and machine learning aims to mitigate this effect to keep sensor results accurate. With a limited power and cost budget, high-frequency data transmission to the cloud is not feasible, so the AI compute must run within the sensor.
The third point is reliability, especially when dealing with business or safety-critical processes. For example, in factories that are increasingly deploying machine learning, production cannot go offline simply due to a drop in connectivity of an AI solution. With AI operating essential parts of processes, safety-critical tasks, and factory production lines are dependent on local AI deployments. One such example is our AI deployment in a hospital environment. In these environments, external networks do not exist or are not allowed. Therefore, focusing on deploying all bits of the solution locally and having the AI solution run with high reliability at all times is crucial.
Finally, for many use cases, even 100 milliseconds of network latency is simply too much. For example, with an event camera monitoring the surroundings of a vehicle, the detection of a pedestrian stepping off the sidewalk must reach the car controls in 10 milliseconds from the first sensor signal. Aside from the obvious reliability requirements in question, the latency constraint here sets limits not only to model complexity but additionally to all the interactions between the hardware and software stack.
Understanding Edge AI – how to plan and build it
At Silo AI, we acknowledge edge AI is complex. Notably, this acknowledgment helps to set expectations yet does not assist towards actual solutions. Based on our experience at Silo AI with real-world development and deployment projects, we’ve found two simple principles for designing AI for the edge:
- Despite the widely varying contexts of edge AI, there is a common structure available for approaching the design. While the answers are different each time, the questions remain the same.
- The parts of this common structure are highly interconnected, requiring deep collaboration across the different development areas.
Below we begin by providing a high-level overview and examples of the structure matter. Overall, our approach to edge AI rotates around the five areas of the diagram: Hardware, Data, Software, AI Modeling, and Operations.
With AI entering the scene, hardware matters as the new power-hungry but essential product element. It might be everything from the right choice of external or integrated AI accelerator hardware to squeezing out performance with hardware-specific optimizations.
As one example of hardware and AI interplay, consider the cars, phones, and laptops around you. There is an ongoing need in the industry to remove physical sensor hardware from all of them, replace the sensor functionality with AI, and thus cut down the production cost and the ultimate price to the consumer of the end-product.
Understanding memory bandwidths and thermal loads (the heat created by running AI computation), power budgets, middleware, chip instruction sets, etc., allows for detailed optimization. This detailed optimization helps reduce the hardware requirements of the AI solution and improve the end-user experience. This may mean, for example, custom AI model architectures, middleware adaptation, or static tuning. Another example tool is compute placement, in which different parts of a deep neural network are computed on different chips available within the device. While transmitting data from one chip to another takes precious time, this may cut down the overall computation time and energy consumption in many cases.
The questions related to data are just as important and often quite different on edge than in the cloud. This includes dealing with competing applications for quality of service, designing and sharing complex data across use cases, or simply handling the number of different sensor interfaces coming into the device and ensuring that they work.
With devices running increasing numbers of both traditional and AI-based algorithms, it is becoming essential to pick optimal data interfaces to share data. Quite significantly, such interfaces also help compute across different software modules and within AI models across different branches of deep neural networks, in addition to sharing data.
Data questions have direct implications for AI modeling. Selecting the best-performing algorithm for a problem can no longer solely focus on benchmarking models in isolation. Instead, new criteria must also account for device deployment. Latency and throughput requirements must be met within the computational budget allotted by the hardware and given the constraints of sharing those resources with other services. With an explosive growth in compute needs as more AI-based features are introduced, models must be designed from the ground up as modular systems where different functionalities are not considered as separate tasks but as extensions of a common framework. This also opens up new opportunities to improve performance with neural data fusion, which can lead to significant gains over individual task-specific models. Methods such as multi-task learning will be instrumental in training and evaluating such models and deciding when to promote them to production.
The examples above have demonstrated that edge AI should not simply be viewed as models running on devices but as a comprehensive software stack with tight integration to hardware. While AI permits building solutions that can be efficiently mass customized to different end-users, utilizing this machinery to the fullest won’t be possible unless other software components are also developed to be flexible. The road to success involves treating AI development as an integral part of the solution development process. AI isn’t haphazardly plastered onto a solution but instead interfaced to other software components through meticulously planned APIs that explicitly define the operational limits for edge AI. On the practical side, this will also influence how development teams organize, creating a rich environment that promotes cross-competence collaboration and fosters innovations.
Operations tooling is the ingredient that elevates edge AI from single device proofs-of-concept to scalable deployments on fleets spanning hundreds of devices. Closely paralleling the now ubiquitous cloud tooling, edge MLOps aims to address the full lifecycle management of AI solutions covering everything from orchestration to solution diagnostics and updates. However, having physical devices adds a unique twist to the mix. Workflows must be extended to include steps like device provisioning and management, fleet analytics, shadow deployments, and hardware-specific quality assurance pipelines. Furthermore, hybrid processes are needed to decide what to compute on edge vs. in the cloud and to control which data to transfer.
Investing in a long-term development plan centralized on the idea of building a software stack that gets extended with incremental feature additions through over-the-air upgrades pays off. To give an example based on our experience, in one of our projects, we helped an industrial manufacturer launch a quality control system across multiple factories globally. Our client produces hundreds of different items, making it infeasible to develop the system as a static solution that is trained once and then operated in perpetuity as production data is heterogeneous and changes over time. By implementing a framework that spans factory devices and the cloud, we were able to tackle the issue systemically. Initial models were tailored to a particular site, rapidly built, and rolled out to production, with selective data synchronization providing a solid foundation to drive further development and adaptation.
Operating AI on edge also raises a host of new challenges. One of the key challenges is how to safely operate the system in case of hardware or other failures. Whereas in the cloud, a fresh copy can simply be spawned on a virtual machine, the lack of proper error handling and recovery in an edge solution will lead to catastrophic consequences, for example, when autonomously controlling heavy machinery. To avoid such issues, it is essential to constantly monitor model input data for signs of drift and other anomalies. In addition, it is important to granularly inspect model outputs to detect when operational boundaries are exceeded and have self-correcting mechanisms in place that allow the solution to adapt to changes in the local environment directly on the device without relying on a stable internet connection (read more about how we set up MLOps for the biggest financial institution in Sweden).
As one somewhat different sensor-related example, Silo AI implemented real-time sensor calibration for a car manufacturer’s autonomy system. As metal gets hot and expands in direct sunlight, it changes the relative positions and attitudes of sensors, causing trouble for all the modules dependent on their data. Our real-time calibration system uses deep learning computer vision to monitor the various incoming vision data streams and detect how their behavior changes compared to each other, thus completing the loop from sensors to models and back.
With this post, we have discussed some of the questions you need to keep in mind when you start working on edge AI. For example, what are the non-negotiables in user experience, product cost, or development timelines? What can the latest AI technology offer in model accuracy and efficiency? What decisions do you need to make for the hardware and software surrounding the models? And what are the technical areas most critically in need of collaboration with AI expertise, and how to organize around that?
Let’s learn together
With this post, we have discussed some of the questions you need to keep in mind when you start working on edge AI in addition to your existing cloud operations. You may also be interested in a slightly more technical article on AI in restricted environments by our Solutions Architect Jukka Yrjänäinen.