# First-Hand:The Unscented Transform

### From GHN

(→What makes Kalman Filters so popular yet problematic for solving nonlinear problems?) |
(→How does the Unscented Transform (UT) work?) |
||

Line 33: | Line 33: | ||

In other words, the most general solution is one in which the state of the system is recalculated at each timestep based on the newest observation combined with all previous ones. Such an approach would have been unthinkable twenty years ago, but it will likely be commonplace in the near future as computers become more powerful. The Kalman filter will always have its place because there will always be problems where the fastest computers aren't fast enough, but its role may diminish. | In other words, the most general solution is one in which the state of the system is recalculated at each timestep based on the newest observation combined with all previous ones. Such an approach would have been unthinkable twenty years ago, but it will likely be commonplace in the near future as computers become more powerful. The Kalman filter will always have its place because there will always be problems where the fastest computers aren't fast enough, but its role may diminish. | ||

− | + | == How does the Unscented Transform (UT) work? == | |

What was most striking to people about the UT was not the accuracy so much as the ease with which it could be implemented. There was no longer a need to derive a linearized approximation which would then have to be coded up for use in the filter. | What was most striking to people about the UT was not the accuracy so much as the ease with which it could be implemented. There was no longer a need to derive a linearized approximation which would then have to be coded up for use in the filter. | ||

Line 41: | Line 41: | ||

Put another way, the UT provides a mechanism for estimating what happens when a nonlinear operation is performed on a partially-characterized probability distribution. This is something that has to be done in almost any nontrivial tracking or control problem. If you know the mean position and associated error covariance defining the state of a satellite, for example, then to predict its future state requires a transformation of the current mean and covariance using nonlinear equations of motion. Technically that’s an ill-posed problem because it requires knowledge of the complete–though unknown–probability distribution defining the current state of the satellite. What I recognized was that in the absence of that knowledge, any assumed distribution with the appropriate mean and covariance is in theory as good as any other, so a choice can be made that makes the computations not just tractable, but easy. | Put another way, the UT provides a mechanism for estimating what happens when a nonlinear operation is performed on a partially-characterized probability distribution. This is something that has to be done in almost any nontrivial tracking or control problem. If you know the mean position and associated error covariance defining the state of a satellite, for example, then to predict its future state requires a transformation of the current mean and covariance using nonlinear equations of motion. Technically that’s an ill-posed problem because it requires knowledge of the complete–though unknown–probability distribution defining the current state of the satellite. What I recognized was that in the absence of that knowledge, any assumed distribution with the appropriate mean and covariance is in theory as good as any other, so a choice can be made that makes the computations not just tractable, but easy. | ||

− | The Kalman filter works because the second central moment of a probability distribution transforms linearly. That is, the covariance of the transformed distribution is just the transformed covariance of the original distribution. The covariance matrix used in the filter can be larger than the actual covariance of the unknown underlying error distribution because each operation that reduces the filter covariance reduces the true underlying error covariance as well. When trying to apply a nonlinear transformation, however, there is an enormous insurmountable technical problem: the nonlinearly-transformed covariance depends fundamentally on the characteristics of the unknown underlying probability distribution. Without knowing that distribution exactly the problem is ill-posed, but that's precisely the situation in virtually all real-world applications of the Kalman filter. Linearization hides the problem but not the effects because, while each transformation is linear, it's the wrong transformation! | + | The Kalman filter works because the second central moment of a probability distribution transforms linearly. That is, the covariance of the transformed distribution is just the transformed covariance of the original distribution. The covariance matrix used in the filter can be larger than the actual covariance of the unknown underlying error distribution because each operation that reduces the filter covariance reduces the true underlying error covariance as well. When trying to apply a nonlinear transformation, however, there is an enormous insurmountable technical problem: the nonlinearly-transformed covariance depends fundamentally on the characteristics of the unknown underlying probability distribution. Without knowing that distribution exactly the problem is ill-posed, but that's precisely the situation in virtually all real-world applications of the Kalman filter. Linearization hides the problem but not the effects because, while each transformation is linear, it's the wrong transformation! |

=== How did you make the switch from linear to nonlinear solutions for nonlinear problems? === | === How did you make the switch from linear to nonlinear solutions for nonlinear problems? === |

## Revision as of 16:02, 3 December 2012

**Contributed by** Jeffrey Uhlmann, University of Missouri

## What led you to inventing the Unscented Transform (UT)?

I began work at the Naval Research Laboratory in 1987, doing research that was relevant to tracking multiple objects ranging from particles in molecular dynamics simulations to missiles in SDI (aka Star Wars) applications. These involved high-dimensional, highly nonlinear motion models that took into account factors such as atmospheric drag at different altitudes, temperature changes, and even local gravitational effects. My original research focus was on spatial data structures, which are needed for quickly finding objects that are near other objects. For example, if you get a line-of-sight sensor measurement of some object, how do you determine which of the possibly thousands of objects in your database produced that measurement? The naive approach would be to check every object in the database to see which one is most likely to be found along that line of sight, but a better data structure can allow the appropriate object to be found much faster. Once found, its state—i.e., position, velocity, etc.—can be updated using a filtering algorithm, typically some variant of the Kalman filter.

The great thing about NRL is that it usually has the latest cutting-edge technologies, which for me at that time included a Connection Machine with up to 65 thousand simple processors and a Butterfly machine with 128 powerful processors. At that time there was a sense, especially at the Pentagon, that any computational bottlenecks that might arise in the tracking of large numbers of objects could be addressed simply with money for more processors. My colleagues and I showed that this was seriously mistaken because the algorithms scaled much worse than linearly; e.g., a doubling of the size of the problem might increase the computation time by a factor of thousand or more.

What was really needed were better algorithms rather than more processors. This was demonstrated when Mike Zuniga and I used a desktop workstation to perform the first-ever simulation of the tracking problem for a million objects in 1990—something that was well beyond what anyone was doing on any machine at that time. We were able to do it by using better algorithms and data structures, and this helped push the SDI program to focus its attention more on computer science than on computers. I told some of this story in an article in American Scientist in 1992.

Around this time I received an email from Hugh Durrant-Whyte, who had recently established a data fusion group at Oxford. He was spearheading the application of the Kalman filter and related tracking techniques to problems in robotics. He asked if I'd like to join his group, and what he described sounded very interesting. So I applied for a 2-year sabbatical from NRL to pursue a doctorate in robotics at Oxford.

### B. Tracking research at Oxford University

I arrived at Oxford University on 29 September 1993. When I met with Hugh for the first time, he introduced me to a completely new problem: how to apply the Kalman filter to track objects in an environment so that an autonomous vehicle can self-localize. In other words, the goal was to have a vehicle identify natural features to use as landmarks so that it always knows where it is. That problem fascinated me because it seemed to require a completely new theoretical generalization of the Kalman filter. In the process of developing the mathematical framework for that I began to re-think other aspects of the way the Kalman filter was used in practice. One of those was the way it was applied to nonlinear systems.

## What are the challenges of nonlinear modeling?

Almost any real-world tracking or control problem involves the use of nonlinear equations defining how the state of a system changes in time (e.g., the motion of a ballistic object) and how sensor measurements taken in one coordinate system are converted to another coordinate system. The Kalman filter is defined for linear models, so the standard approach for accommodating a nonlinear model was to linearize it, i.e., find the linear transformation that best approximates the nonlinear one. I had never really thought deeply about linearization even though I was aware of its limitations. For example, the SDI program funded a group of physicists to develop an ultra-high-fidelity model of a warhead entering the atmosphere, and the result was a high-dimensional set of nonlinear equations taking into effect atmospheric drag, heat dissipation, and local gravitational variations along the trajectory. It was very impressive, but after linearization it produced terrible results. Traditionally people would evaluate nonlinear functions/transformations by linearizing them, i.e., deriving the linearized approximation (Jacobian matrix). The result can be very tedious to obtain or not even well-defined in some cases, and even when it’s done it’s still just a linear approximation. In that particular case I had attributed the poor performance to a failure to tune the filter properly to exploit the higher fidelity model. After a couple of months working on my map-building system at Oxford, however, I came to understand the problem more deeply.

## What makes Kalman Filters so popular yet problematic for solving nonlinear problems?

Stepping back a bit, the Kalman filter is widely used is because it provides an elegant mathematical framework that captures—at least for linear systems—all aspects of the tracking problem. First and foremost, it provides a way to represent the uncertainty associated with the state of a system, e.g., the position and velocity of a moving object, in the form of a mean vector and an associated error covariance matrix. This is significant because it isn't clear a priori how best to represent uncertainty. For example, it could be represented in the form of an error radius, a confidence factor of some sort, or something else entirely.

Kalman showed that if the covariance matrix somehow relates to the error associated with the mean position vector, then his filtering algorithm will reduce that error in a particular fashion as observations are made. He further showed that his algorithm is the best possible if the error in the mean is Gaussian-distributed and the covariance matrix is equal to the second central moment (i.e., the covariance) of that Gaussian distribution. In fact, that's why that matrix is almost universally referred to as a "covariance matrix," even though the Kalman filter doesn't require an assumption of Gaussianity or even that the covariance matrix exactly represents the second central moment of some other underlying probability distribution. Unfortunately, many later authors chose to impose a strict Bayesian interpretation on the Kalman filter which does make these additional assumptions, and this has resulted in a continuing misconception that the Kalman filter requires Gaussianity assumptions for theoretical rigor. What the Kalman filter requires is a set of linear transformations, typically referred to as models, describing how the system evolves through time and how to transform between the system coordinates and the coordinates of the sensor observations. (It also requires that error processes be independent or that their correlations are known exactly.)

The Kalman filter’s importance stems mostly from the way it formulates the overall structure of a filtering system, but there's another reason why it became so widely used: Apollo. The Kalman filter was studied and used in at least two systems onboard the Apollo spacecraft. That really sanctified it to such an extent that afterward nobody could propose anything in the area of tracking or control without saying that it was based on Kalman filtering. The problem is that even on Apollo there were practical issues with its performance due to nonlinear effects. One Apollo engineer described to me that it was used for attitude control for docking, but with a human (astronaut) in the loop. The filter would be turned on and it was understood that it would only give reliable outputs for around eight seconds, after which it would be recycled or re-started. That was okay because it gave good enough snapshots of information to help with steering the spacecraft.

In other post-Apollo applications the use of the so-called Extended Kalman Filter (EKF), which is basically the Kalman filter plus linearization, proved to be more problematic because it often required an enormous amount of tuning and tweaking to produce decent, stable results. The Kalman filter had to be used, but many implementers regarded it as a kind of stone soup in which any good performance was likely due as much to the tuning efforts of the implementer as it was to the filter architecture itself. Once completed, the system was typically so tuned to a particular scenario that any significant deviation would cause the filter to diverge (fail). One old-timer described the EKF as a "ball of chalk" because of its tendency to fall apart if stressed.

Unfortunately, filtering technology today is not really that much better. What has changed is the speed of processing which allows many more observations to be incorporated to keep the filter on track. A cynic might say that enough sensor observations can make anything easy to track, but good engineering still plays as big or even bigger role in maximizing accuracy and stability. So, going back to the merits of the Kalman filter, whether it can be applied rigorously in a given application is possibly not as important as the fact that it provides a framework for structuring the tuning process to make it work. Someone might ask whether there is a better technique, yet to be discovered, that can be applied rigorously to practical problems, but the answer is no. The Kalman filter belongs to a family of approaches that maintains a fixed amount of information about the state of a system while updating that state with observations of it. When it comes to nonlinear systems this isn't possible because each observation adds new information that can't simply be rolled into a fixed-sized estimate. The only possible general solution is to have the filter maintain information that is roughly proportional in size to the entire observation history.

In other words, the most general solution is one in which the state of the system is recalculated at each timestep based on the newest observation combined with all previous ones. Such an approach would have been unthinkable twenty years ago, but it will likely be commonplace in the near future as computers become more powerful. The Kalman filter will always have its place because there will always be problems where the fastest computers aren't fast enough, but its role may diminish.

## How does the Unscented Transform (UT) work?

What was most striking to people about the UT was not the accuracy so much as the ease with which it could be implemented. There was no longer a need to derive a linearized approximation which would then have to be coded up for use in the filter.

What the Unscented Transform does is to replace the mean vector and its associated error covariance matrix with a special set of points with the same mean and covariance. In the case of the mean and covariance representing the current position estimate for a target, the UT is applied to obtain a set of points, referred to as sigma points, to which the full nonlinear equations of motion can be applied directly. In other words, instead of having to derive a linearized approximation, the equations could simply be applied to each of the points as if it were the true state of the target. The result is a transformed set of points, and the mean and covariance of that set represents the estimate of the predicted state of the target.

Put another way, the UT provides a mechanism for estimating what happens when a nonlinear operation is performed on a partially-characterized probability distribution. This is something that has to be done in almost any nontrivial tracking or control problem. If you know the mean position and associated error covariance defining the state of a satellite, for example, then to predict its future state requires a transformation of the current mean and covariance using nonlinear equations of motion. Technically that’s an ill-posed problem because it requires knowledge of the complete–though unknown–probability distribution defining the current state of the satellite. What I recognized was that in the absence of that knowledge, any assumed distribution with the appropriate mean and covariance is in theory as good as any other, so a choice can be made that makes the computations not just tractable, but easy.

The Kalman filter works because the second central moment of a probability distribution transforms linearly. That is, the covariance of the transformed distribution is just the transformed covariance of the original distribution. The covariance matrix used in the filter can be larger than the actual covariance of the unknown underlying error distribution because each operation that reduces the filter covariance reduces the true underlying error covariance as well. When trying to apply a nonlinear transformation, however, there is an enormous insurmountable technical problem: the nonlinearly-transformed covariance depends fundamentally on the characteristics of the unknown underlying probability distribution. Without knowing that distribution exactly the problem is ill-posed, but that's precisely the situation in virtually all real-world applications of the Kalman filter. Linearization hides the problem but not the effects because, while each transformation is linear, it's the wrong transformation!

### How did you make the switch from linear to nonlinear solutions for nonlinear problems?

I came to realize that we were compounding the inherent technical problem of insufficient knowledge of the true underlying error distribution with the use of a crude approximation to the true transformation equations. That’s when it hit me: I can encode the given mean and covariance information exactly in the form of a set of points—called sigma points—which has the same mean and covariance. This set of points can then serve as a proxy for the unknown error distribution in the sense that, if we don't know which distribution corresponds to the given mean and covariance, then any particular choice of distribution is as good as any other for estimating the transformed state. What makes this point set attractive is that we can transform it exactly using our nonlinear transformations with no need for linearization. More specifically, we can just apply our nonlinear transformation to each of the sigma points with the resulting set of points representing the transformed distribution. We compute the mean and covariance of the transformed sigma points and this will give us our estimate of the nonlinearly transformed mean and covariance that we need for the Kalman filter. It's simple, elegant, and no other approach can provably do better without making other assumptions. That's the Unscented Transform (UT).

Some people mistakenly think that the sigma points used by the UT are somehow random samples analogous to Monte Carlo methods, when in fact they are deterministically computed to exactly encode the given mean and covariance information. In fact, the UKF produces exactly the same results as the Kalman filter in the linear case—they are provably equivalent—but the UKF is defined and operates the same way with nonlinear transformations, though of course the results are necessarily suboptimal.

### How long did it take to develop the Unscented Transform?

There was nothing too hard about developing the transform itself other than coming up with algorithms for generating the required sets of sigma points. At first I worked out how to construct the general algorithms in N dimensions for finding the minimal set of N+1 sigma points and the minimal symmetric set of 2N points to encode a given N-dimensional mean and covariance. For experimental verification, I tested it out on some simple examples involving nonlinear transformations of sensor observations. These verified that it produced better results than linearization and in some cases much better results. My feeling was that in more stressing applications the difference would be even more pronounced.

### Did you develop this on your own or collaborate with others?

I described the approach to people in my lab, and a couple of them, Simon Julier and Ben Quine, were particularly interested. They became really convinced when they ran it on their own highly nonlinear problems. Later Simon developed algorithms for generating sets of sigma points for exploiting information about higher moments of a distribution when that information is available.

### How did you your long-term collaboration with Simon come about?

Simon was just beginning his doctoral studies in the Robotics Research Group when we became active collaborators in 1994. We’re extremely compatible in many ways, not least of which is that we both really enjoy what we do and can talk for hours and hours about it. Simon was working on a problem that had the potential to really demonstrate the value of the UT, so that formed the basis of our initial collaboration. In addition to becoming an equal partner on the theory side, Simon is responsible for most of the empirical results in our various papers. A year or so after I returned to NRL in 1995, Simon joined me in the same branch and we continued our collaborations with funding from the Office of Naval Research.

### What’s the difference between the Unscented Transform (UT) and Unscented Kalman Filter (UKF)?

The Unscented Transform (UT) is applicable whenever a state estimate needs to be transformed from one coordinate system to another. The UKF is just the use of the UT within a Kalman filter to deal with those kinds of transformations.

### What’s the appeal of the Unscented Transform?

### Where do people use the UT and UKF?

Over the ten years following 1994 the Unscented Transform (UT) and Unscented Kalman Filter (UKF) were widely adopted in preference to linearization-based techniques, e.g., the Extended Kalman Filter (EKF), because they are often more accurate and easier to implement. As a consequence Simon and I were invited to submit a paper for a special issue of Proceedings of the IEEE in March 2004. Both separately and jointly we applied the Unscented approach to a variety of challenging tracking, control, and estimation problems. Today there are a plethora of methodologies built on top of the UT that are tailored to deal with the effects of nonlinearities in applications ranging from financial and weather forecasting to the Mars rover program, where I was an evaluator for testing of filter modules for the Mars Science Laboratory.

Closer to home I mentored a local high school student, Anand Palaniappan, for his project to computerize asteroid tracking. He was a semifinalist in both the Siemens Westinghouse and Intel competitions.

### What’s with the Name “Unscented”?

Initially I only referred to it as the “new filter.” Needing a more specific name, people in my lab began referring to it as the “Uhlmann filter,” which obviously isn’t a name that I could use, so I had to come up with an official term. One evening everyone else in the lab was at the Royal Opera House, and as I was working I noticed someone’s deodorant on a desk. The word “unscented” caught my eye as the perfect technical term. At first people in the lab thought it was absurd—which is okay because absurdity is my guiding principle—and that it wouldn’t catch on. My claim was that people simply accept technical terms as technical terms: for example, does anyone think about why a tree is called a tree?

Within a few months we had a speaker visit from another university who talked about his work with the “unscented filter.” Clearly he had never thought about the origin of the term. The cover of the issue of the March 2004 *Proceedings* we’re discussing right now has “Unscented” in large letters on the cover, which shows that it has been accepted as the technical term for that approach.

## References

Simon J. Julier and Jeffrey K. Uhlmann, “Unscented Filtering and Nonlinear Estimation,” *Proc. IEEE* Vol. 92, No. 3 (March 2004), p. 401-22.

Simon J. Julier and Jeffrey K. Uhlmann, “Corrections to ‘Unscented Filtering and Nonlinear Estimation,’” *Proc. IEEE* Vol. 92, No. 12 (December 2004), p. 1958.

---

S. J. Julier, J. K. Uhlmann, and Hugh F. Durrant-Whyte, “A New Approach for Filtering Nonlinear Systems,” *1995 American Control Conference*, Seattle, WA, p. 1628-32.

S. J. Julier and J. K. Uhlmann, “A General Method for Approximating Nonlinear Transformations of Probability Distributions,” *Robotics Research Group Technical Report*, Department of Engineering Science, University of Oxford: November 1996, p. 1-27.

S. J. Julier and J. K. Uhlmann, “A New Extension of the Kalman Filter to Nonlinear Systems,” *Proc. AeroSense: 11th Int. Symp. Aerospace/Defense Sensing, Simulation and Controls* (1997), p. 182-93.

Jeffrey K. Uhlmann, “Algorithms for Multiple-Target Tracking,” *American Scientist* 80, no. 2 (March-April 1992), p. 128-41.

Jeffrey K. Uhlmann, *Dynamic Map Building and Localization: New Theoretical Foundations*, University of Oxford Doctor of Philosophy thesis, Trinity Term 1995.

Jeffrey K. Uhlmann and Miguel R. Zuniga, “Results of an Efficient Gating Algorithm for Large-Scale Tracking Scenarios,Naval Research Reviews 1 (Spring 1991), p. 24-9.

Jeffrey K. Uhlmann, Miguel R. Zuniga, and J. Michael Picone, “Efficient Approaches for Report/Cluster Correlation in Multitarget Tracking Systems,” *NRL Report 9281*, 31 December 1990, p. 1-14.