# Introduction

The Kalman filter is an (optimal) estimator of linear systems with Gaussian (Normal) error statistics. It is useful for making educated guesses about:

• the value of something given a lot of noisy data (Smoothing noisy data)
• the value of something given missing data (Estimating missing data)
• the value of something is the future (Forecasting future states and observations)

Cool examples of the use of Kalman Filters include the Apollo navigation computer, vehicle tracking, estimating financial time series and satellite navigation1.

# How it works

## 1. The model

The Kalman filter assumes that the state of a system evolves from a prior state at time $(t-1)$ according to the equation:
$x_t=F_t x_{t-1} + w_t + B_tu_t$ (1)
where $x_{t-1}$ is the prior state, ($u_t$) is optional control input and ($w_t$) is a noise term representing uncertainty about the prior state. The uncertainty ($w_t$) is assumed to be multivariate normal with a zero mean and has a covariance matrix $(Q_t)$ (called the Process noise covariance matrix) indicating the extent that each state varies with another. The matrix $(F_t)$ is called the ‘System Equation’ represents how a system evolves through time.
For example if the state x is a vector representing the position and velocity of a person moving at a constant speed then the system matrix (F) would represent the equation $(s_t = s_{t-1}+v_{t-1}\Delta t)$ and $(v_t=v_{t-1})$ and the system equation would be

$F =\begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}$.

The Kalman filter also assumes that observations of the state of system are a linear function2 of the states plus a noise term:
$z_t=H_tx_t+v_t$ (2)
where ($z_t$) is the observed state, ($H_t$) is the transformation matrix that translates states to the measurement domain, and ($v_t$) is measurement noise assumed to be zero mean with covariance($R_t$).

## 2. Estimating true state of the system given some observations

The state of the system can not be observed directly due to noise and perhaps also due to a transformation from the state domain to the observation domain. For example, you might be estimating the position of a person based on observations of their speed. Therefore an algorithm is required to estimate the state ($\hat{x_t}$) (position). Estimates of the parameters ($\hat{x_t}$) are not discrete values but described by probability density functions (pdfs). These pdfs are assumed to be Gaussian and their covariances are described by a covariance matrix ($P_t$). The Kalman filter algorithm estimates ($\hat{x_t}$) using a two-step recursive algorithm: 1) state prediction; 2) measurement update. The state prediction estimates the state given the system matrix and knowledge of the state at the previous time step. The measurement update combines information provided by the observations with those of state prediction to generate an updated (improved) estimate of the states.

The Kalman state estimation equations are:
$\hat{x}_{t|t-1}=F_t x_{t-1} + w_t + B_tu_t$ (3)
$P_{t|t-1}=F_tP_{t-1}F_{t}^T + Q_t$ (4)
where ($P_{t|t-1}$) are the covariances associated with the predicted state vector.

The Kalman measurement update equations are:
$\hat{x}_{t}=\hat{x}_{t|t-1}+K_t(z_t-H_t\hat{x}_{t|t-1})$ (5)
$P_{t}=P_{t|t-1}-K_tH_tP_{t|t-1}$ (6)
$K_t = P_{t|t-1} H^T_t(H_tP_{t|t-1}H^T_t+R)^{-1}$ (7)

where ($K_t$) is the optimal Kalman gain which is a function of the relative certainty of the measurements vs the current state. ($z_t-H_t\hat{x}_{t|t-1}$) represents the residual of the measurement. ($H_tP_{t|t-1}H^T_t+R$) represents the covariance of the residual.

Equation 5 is stating that the updated state is the estimated state ($\hat{x}_{t|t-1}$) + some proportion of the difference between the measurement ($z_t$)) and the estimate. This proportion is determined by the Kalman gain matrix ($K$)).

Equation 6 is stating that the updated covariance for the estimate is the estimated covariance + some proportion of the covariance of the residuals. This proportion is determined by the Kalman gain ($K$).

Equation 7 is stating that the optimal kalman gain $K$ is the current estimate of the state covariances mapped to the measurement domain ‘divided’ by the residual covariance $(H_tP_{t|t-1}H^T_t+R)$. If the residual covariance is small then the Kalman gain is large. This means that the estimates produced by equation 5 will put more weight on the estimate residuals when they are close to the estimated residuals and less emphasis when they are far away from the expected value. E.g. If observations are noisy then pay little attention if they are close to expected then pay attention.

These equations can be applied to any problem that you can formulate in state space form.

## 3. Deriving the Kalman State Estimation Equations

The state estimation equation (3) is a straight-forward application of equation one to calculate an estimate of x given its prior state and any known control inputs. The covariance estimation equation (4) is simply the variance3 of the state estimation i.e. $(P_{t|t-1}=E[ (x_t - \hat{x}_{t|t-1}) (x_t - \hat{x}_{t|t-1})^T ])$. Since we do not know the value $latex(x_t)$ we need to derive it by subtracting equations 1 from 3.

$P_{t|t-1}=E[ (F( x_{t-1} - \hat{x}_{t|t-1})+w_t) (F( x_{t-1} - \hat{x}_{t|t-1})+w_t)^T ]$
$P_{t|t-1}= F E[(x_{t-1} - \hat{x}_{t|t-1})(x_{t-1} - \hat{x}_{t|t-1})^T] F^T + E[w_tw_t^T]$
$P_{t|t-1}= F P_{t-1} F^T + Q$

## 4. Deriving the Kalman Measurement Estimation Equations

The update of the state given an observation (Equation 5) can be thought of as the estimated state updated with some amount of the residual. If the weight of the residual is 1 then the state estimate is updated to the observation, conversely if the weight of the residual is 0 then the state estimate ignores the observation.

$\hat{x}_{t}=\hat{x}_{t|t-1}+K_t(z_t-H_t\hat{x}_{t|t-1})$
$\mu_{updated}=\mu_{estimated}+ \left( \frac{H\sigma_{estimated}^2}{H^2\sigma_{estimated}^2+\sigma_{observed}^2} \right). (\mu_{obs}-H\mu_{estimated})$

This equation is justified on the properties of Gaussian statistics – namely when you multiply two Gaussian pdfs their distribution is also a Gaussian pdf4. This can be thought of in terms of the multiplication of exponentials is equivalent to summing the exponents which means that ($\mu_{updated}$) and ($\sigma_{updated}$) are linear functions of the previous ($\sigma$)and ($\mu$). As it can be seen above the weighted sum is a linear function of means. It is also in a form similar to the following known result when multipling normal PDFs ($\mu_{updated}=\frac{\sigma_1^{-2}\mu_1+\sigma_2^{-2}\mu_2}{\sigma_1^{-2}+\sigma_2^{-2}}$). and the updated variance is ($\sigma_{updated}^2=\frac {\sigma_1^2\sigma_2^2} {\sigma_1^2+\sigma_2^2}$).

The update of the covariance matrix (Equation 6) may be similarly constructed to the state update equation.
$P_{t}=P_{t|t-1}-K_tH_tP_{t|t-1}$
$\sigma_{updated}^2=\sigma_{estimated}^2 - \left( \frac{H\sigma_{estimated}^2}{H^2\sigma_{estimated}^2+\sigma_{observed}^2} \right) H\sigma_{estimated}^2$
It may be thought of as the estimated ($\sigma$)reduced in proportion to the observed variance. As it can be seen above the weighted sum is a linear function of the variance and therefore fits the form expected of a normal distribution.

1. https://en.wikipedia.org/wiki/Kalman_filter#Applications
2. A linear function, or more strictly speaking an affine function, is the equation of a line (mx +c) i.e. a value multiplied by a constant plus a constant.
3. ($Var = E([x-E(x)]^2)$)
4. See for example http://www.johndcook.com/blog/2012/10/29/product-of-normal-pdfs/