Imagine a following scenario: A company has a complicated machine. Different parts of the machine work in interaction with other parts as well as with the input. Some parts of the input, output and the inner state of the machine are monitored by some sensors. Some sensor reading suggests that the machine is not behaving as it should. To be able to fix the problem, it is necessary to identify the cause of the problem. Preferably fast. Where to start the search?

Often, one uses domain expertise to recognize causes. Engineers who have worked with the machine gather knowledge of what the most common causes are and what further diagnostics are useful. However, it is possible to assist experts with automated tools based on machine learning such as Bayesian networks.

Bayesian networks belong to a class of models called probabilistic graphical models. As the name suggests probabilistic graphical models represent joint probability distributions of several variables in a graphical form; a variable is roughly a quantity whose value may change (e.g., a sensor reading). Probabilistic models are particularly good in handling uncertainty. Instead of just returning the most probable cause, it is possible to get a list of potential causes, each with a probability attached. Furthermore, a graphical representation makes it possible to specify the distribution in a compact form which makes it possible to model distributions with lots of variables.

### How to get a Bayesian network?

To be able to use a Bayesian network, we need to construct one first. To this end, we can ask our domain experts to write down the dependencies between variables and specify local probabilities. However, this approach requires precise knowledge of all minute details of the process and does not scale up to large networks. An alternative way is to learn a Bayesian network from data. We start by collecting data. For example, we can collect snapshots of the values of the variables. Then, we use these data to find a Bayesian network that describes the relations between different variables. There are several types of learning algorithms and some, called score-based algorithms, allow incorporating expert knowledge. Thus, it is possible to learn Bayesian networks by combining both human expertise and data-driven modelling.

### What to do with a Bayesian network?

Once we have our network ready, we can compute probabilities of variables given some values of other variables. This task is called inference. For example, we can compute probabilities of different causes given the observed sensor readings.

The inference with Bayesian networks is not completely unproblematic. With large and complex systems, computing conditional probabilities may take long time. Fortunately, if the Bayesian network has low tree-width (a property of the underlying graph) then inference is computationally feasible also for large networks. This implies that sometimes when we have to make decisions fast we may prefer Bayesian networks with low tree-width even if such models are less accurate. Recently, this topic has been researched actively and both exact, find more from the research papers from NIPS and AISTATS:

- Janne H. Korhonen & Pekka Parviainen: Exact Learning of Bounded Tree-width Bayesian Networks. AISTATS 2013
- Pekka Parviainen, Hossein Shahrabi Farahani & Jens Lagergren: Learning Bounded Tree-width Bayesian Networks using Integer Linear Programming. AISTATS 2014
- Advances in Learning Bayesian Networks of Bounded treewidth. NIPS 2014
- Learning treewidth bounded bayesian networks with thousands of variables. NIPS 2016

If you are interested in technical details about learning bounded tree-width Bayesian networks, please see a recent Silo.AI webinar video here on Learning Bayesian Networks with bounded graph parameters.