Sensor fusion in a nutshell is about combining the information from multiple different sensors in order to obtain a more accurate picture than any single sensor could provide by itself. For example, autonomous land vehicles need to accurately determine their own position, orientation, and state of motion with respect to their surroundings. They also need to have accurate information of all other dynamic actors in their vicinity, such as pedestrians and other automobiles.

From a theoretical point of view, sensor fusion is firmly based on understanding probability as a state of knowledge, which allows us to combine and manipulate different sources of information via the methods of Bayesian statistics. From a more practical point of view, real-time numerical sensor fusion is viable primarily because Bayesian combinations of Gaussian probability densities yield Gaussian densities as a result. Thus if we can approximate our current state of knowledge and the incoming information using Gaussian probability densities, we can leverage analytical formulae and numerical linear algebra. This allows us to efficiently compute our updated state of knowledge.

In this blog article, I will briefly review some of the basic mathematical and statistical building blocks of sensor fusion. As an AI scientist with a background in applied mathematics and theoretical physics, my intention is to provide some condensed, yet precise glimpses into the theoretical machinery that powers sensor fusion. We have covered sensor fusion and its real-world applications from a general point of view in our previous blog article.

## Combining information the probabilistic way

The goal in sensor fusion applications is typically to obtain an accurate estimate of the state of some system. The state is usually most conveniently represented by a 𝑛-dimensional vector, \(x\in\mathbb{R}^n\). The vector could contain for example the position and orientation and their derivatives of an autonomous vehicle, with respect to some coordinate system. We can use \(H_x\) to denote the proposition that the state vector is 𝑥 (see [1] for some technical details we are omitting). Further, we set 𝐷 to denote the data we have just obtained, and use 𝐼 to denote any other information relevant to the problem we might have.

We are ultimately interested in the quantity \(P(H_x|DI)\), or the probability that the state is 𝑥, conditional to the data and everything else we know about the problem. Typically, we are dealing with continuous state variables, such as the position or velocity of a moving vehicle. In the continuous case, the object of interest is actually the probability density \(f_H(x|DI)\) of the state 𝑥.

To get there, we can start from the probability that both \(H_x\) and 𝐷 are true, \(P(H_x D|I)\). Using the product rule of Bayesian theory we can write this in two equivalent forms $$𝑃(𝐻_𝑥𝐷|𝐼) = 𝑃(𝐻_𝑥|𝐷𝐼)𝑃(𝐷|𝐼) = 𝑃(𝐷|𝐻_𝑥𝐼)𝑃(𝐻_𝑥|𝐼), \quad (1)$$ from which we find $$𝑃(𝐻_𝑥|𝐷𝐼) = \frac{𝑃(𝐷|𝐻_𝑥𝐼) 𝑃(𝐻_𝑥|𝐼)}{𝑃(𝐷|𝐼)}, \quad (2)$$ which is known as the Bayes’ theorem.

The role of 𝑃(𝐷|𝐼) in sensor fusion applications is mainly as a normalization constant, which, if necessary, we can obtain from $$𝑃(𝐷|𝐼) = \int 𝑃(𝐷|𝐻_𝑥𝐼)𝑃(𝐻_𝑥|𝐼)\mathrm{d}𝑥, \quad(3)$$ with a slight abuse of notation, since the \(\{𝐻_𝑥|𝑥 \in\mathbb{R}^n\}\) are assumed to be a mutually exclusive and exhaustive set of propositions. Likewise, unless we really do have some relevant prior information, the prior distribution \(𝑃(𝐻_𝑥|𝐼)\) can be assumed to be non-informative, typically a constant as well. The end result is that we find $$𝑃(𝐻_𝑥|𝐷𝐼) \propto 𝑃(𝐷|𝐻_𝑥𝐼), \quad (4)$$ or that the posterior probability is proportional to the likelihood.

Now, the main point of sensor fusion is that we are combining data from multiple sensors. So let us set that \(𝐷 = 𝐷_𝐴 𝐷_𝐵\), by which we denote that we have obtained a datum both from sensor 𝐴 as well as sensor 𝐵. We will keep referring to only these two sensors although everything in the following readily generalizes for any number of simultaneous measurements.

Using the product law, we then have $$𝑃(𝐷_𝐴𝐷_𝐵|𝐻_𝑥𝐼) = 𝑃(D_𝐴|𝐷_𝐵𝐻_𝑥𝐼)𝑃(𝐷_𝐵|𝐻_𝑥𝐼) = 𝑃(𝐷_𝐴|𝐻_𝑥𝐼)𝑃(𝐷_𝐵|𝐻_𝑥𝐼), \quad (5)$$ where in the last equality we have assumed the measurements to be statistically independent. We are finally left with $$𝑃(𝐻_𝑥|𝐷𝐼) \propto 𝑃(𝐷_𝐴|𝐻_𝑥𝐼)𝑃(𝐷_𝐵|𝐻_𝑥𝐼),\quad (6)$$ or that the posterior probability is proportional to the product of the data likelihoods. Correspondingly, for the probability density of the state we have $$𝑓_𝐻(𝑥|𝐷𝐼) \propto 𝑃(𝐷_𝐴|𝐻_𝑥𝐼)𝑃(𝐷_𝐵|𝐻_𝑥𝐼).\quad (7)$$

## Gaussians in, Gaussians out

To proceed further we need to be able to make sense of the likelihoods \(𝑃(𝐷|𝐻_𝑥𝐼)\). This amounts to specifying how a particular state 𝑥 and a particular measurement are related, probabilistically. For the sake of this blog article, we are making the simplifying assumption that the measurements \(z\in\mathbb{R}^n\) from all sensors belong to the same space \(\mathbb{R}^𝑛\) as our state (see e.g. [2] for when this is not the case).

Since the sensor readouts typically are also real numbers of sets of real numbers, we can denote e.g. \(𝐷_{𝐴,𝑧}\) to mean the proposition that the sensor 𝐴 produced a value 𝑧. Defining a probability density \(f_𝐴(𝑧|𝑥)\) parametrized by the state then allows us to write the likelihoods as $$𝑃(𝐷_{𝐴,𝑧}|𝐻_𝑥𝐼) \propto 𝑓_𝐴(𝑧|𝑥).\quad (8)$$

For practical computations, the sensor likelihoods, \(f_𝐴\) and \(𝑓_𝐵\) for the sensors 𝐴 and 𝐵 need to have a computationally suitable representation. Turns out that Gaussian densities are extremely convenient for this purpose and also often correspond reasonably well to the real-world properties of measurements. As such, we choose to represent the measurement likelihoods as $$𝑓_𝐴(𝑥|𝑧) = \mathcal{N} (𝑧; 𝑥, 𝑅_𝐴),\quad (9)$$ where \(\mathcal{N} (𝑧; 𝑥, 𝑅_𝐴)\) is the Gaussian probability density of a multivariate normal distribution with a mean of 𝑥 and a covariance matrix of \(𝑅_𝐴\). Here *z* represents the measured value and the covariance matrix represents the measurement uncertainty. The case for sensor 𝐵 is naturally identical.

We can now leverage the properties of Gaussian densities which gives us $$\mathcal{N} (𝑧_𝐴; 𝑥, 𝑅_𝐴) \mathcal{N} (𝑧_𝐵; 𝑥, 𝑅_𝐵) \propto \mathcal{N} (𝑧; 𝑥, 𝑅),\quad (10)$$ where \(𝑧 = 𝑅(𝑅^{-1}_𝐴 𝑧_𝐴 + 𝑅^{−1}_𝐵 𝑧_𝐵)\) is the effective measurement value and \(𝑅 = (𝑅^{−1}_𝐴 + 𝑅^{−1}_𝐵 )^{−1}\) is an effective measurement covariance. Thus the product of Gaussian sensor likelihoods is proportional to a single Gaussian density, with an effective measured value that is a weighted mean of the individual sensor measurements. For a graphical representation of this phenomenon, see Figure 1.

## The benefits of sensor fusion

While it is clear that the property of products of Gaussians to yield a single Gaussian is useful from a numerical point of view, it is also useful for inferences in general. If we consider instead of two different sensors some larger number 𝑁 of different sensors, the generalization of equation (10) yields $$𝑧 = 𝑅\sum_{i=1}^N 𝑅^{−1}_𝑖 𝑧_𝑖 \quad (11)$$ $$𝑅=\left( \sum_{i=1}^N 𝑅^{−1}_𝑖\right)^{−1}.\quad (12)$$

From equation (11) we see that any measurement \(𝑧_𝑗\) with a particularly large uncertainty is automatically discounted in the computation of the effective measured value. This is caused by the inverse \(𝑅^{−1}_𝑗\) yielding a small contribution for the product \(𝑅^{−1}_𝑗 𝑧_𝑗\). This is one of the tangible benefits of sensor fusion.

In addition, we see from equation (12) that even if all \(𝑅_𝑖\) are equal, we still have \(𝑧 = (\sum_i 𝑧_𝑖)/𝑁\) and \(𝑅 = 𝑅_𝑖/𝑁\), or that covariance of the effective measurement is reduced by a factor of 1/𝑁 compared to the incoming measurements. This is essentially the 1/√𝑁 law of averaging independent measurements in action and is a further advantage of sensor fusion.

However, if the different sensors can produce results that have geometrically advantageous distributions with respect to each other, as in Figure 1., the resulting uncertainty in the effective measured value can be much smaller as a result. In these cases, our state of knowledge is greatly enhanced by combining separate uncertain but complementary measurements, and this is what sensor fusion is really all about.

## Personal reflections

On a personal level, I have had the opportunity to work with sensor fusion algorithms in a variety of different contexts. These include industrial measurement systems, satellite radar, and most recently at Silo AI, working with maritime Augmented Reality (AR) systems together with Groke Technologies. Across these projects, the same fundamental principles of sensor fusion apply due to the common unifying theme of dynamics and combination of different sources of information. This is despite the fact that the state representations, data, and sensors can be quite different in these different contexts. I have been positively surprised time after time how smoothly such abstract – and some could argue abstruse – mathematical concepts as forms, metrics, entropy, and information translate into powerful numerical algorithms powering our modern civilization.

I personally consider sensor fusion a fascinating field, with a rare combination of some fundamentals of probability theory, a hefty dose of applied mathematics, and plenty of important real-world applications. However, in this article, we have out of necessity only very thinly scraped the surface of sensor fusion. As such, there are several interesting facets of sensor fusion that we have not mentioned at all.

For example, we have not touched on the question of how information is propagated through time – an aspect that is crucial for most real-life scenarios. In practice, measurement data comes in as timed sequences, and we wish to constantly update our information to correspond to the present moment based on the latest measurements. This is a topic commonly referred to as filtering, with the main tool in the field being the ubiquitous Kalman filter. However, covering the Kalman filter in all its interesting detail would require a whole blog article of its own. Fortunately, such a blog post already exists, and we refer the reader to the excellent and visually satisfying description [3].

## References

[1] Edwin T. Jaynes. *Probability Theory – The Logic of Science.* Cambridge University Press, Cambridge, United Kingdom, 2019.

[2] Wolfgang Koch. *Tracking and Sensor Data Fusion – Methodological Framework and Selected Applications.* Springer Verlag, Berlin, Heidelberg, Germany, 2014.

[3] Tim Babb. *How a Kalman filter works, in pictures.* https://www.bzarg.com/p/how-a-kalman-filter-works-in-pictures/

*Would you like to work with an AI Scientist like Pauli? For business, get in touch with our VP of Business Development Pertti Hannelin to find out how we could help at pertti.hannelin@silo.ai or via LinkedIin. For open positions, see silo.ai/careers.*