Outlier Removal

by Shaun Snapp on July 20, 2010

What This Article Covers

  • What is outlier removal?
  • How outlier removal is commonly abused?

Outlier Removal

Outlier removal is always a topic if great interest on forecasting projects. Typically people on the project will recommend the removal of outliers of the previous demand history. This is often a topic that confuses many people. However, the rule on outlier removal is fairly simple. If the outlier in question has a high probability of repeating in the future, the outlier should not be removed.

A finance firm that wants to provide an overestimate of the stock market return would simply remove the period from roughly 1975 to 1983, and begin counting from 1983.

How Outlier Removal is Abused

Outlier removal is often used by those that are interested in telling a different story than derived from the data. Misleading outlier removal is frequently performed by the financial industry to mislead investors to place their bets on things that are far less stable than they think they are. A perfect example is the return on the stock market. In order to drum up interest in the stock market, investment “professionals” will often list the stock market return beginning from a particular point in time. What they do not mention is that just prior to that point in time; the stock market went through a major correction. When pressed on this point, the investment professional, surprised anyone saw fit to perform independent research, will state that the time in question is an “outlier” or “one-time” event. Therefore it should be removed from the overall calculation. However, the fact is that if investors had been invested at that time, they would have suffered a decline. Outlier removal has also been used to falsify research results. For instance, Aspartame (more conventionally known under the brand name NutraSweet), was only able to make it through FDA approval because rats which given NutraSweet and were observed to have large tumors were removed from the sample.

Aspartame could not have been approved without the removal of “outliers.”

NutraSweet is just the tip of the iceberg in misleading outlier removal that is submitted to the FDA. Pharmaceutical trials remove patients who have such adverse reactions that they drop out of the study. Uncounted, the adversely impacted do not show in the study. The list of biased outlier removal goes on and on in pharmaceutical clinical trials. It is simply too easy to remove outliers to manipulate the data to whatever the desired results are. This is one reason one must so careful with outlier removal.

How One Leading Forecasting Vendor Describes Outlier Removal

For those that have read this blog previously, you will know that I frequently discuss a very easy to use and powerful forecasting application called Demand Works Smoothie. The Smoothie user manual has the following to say about outlier removal.

History Adjustments: Check for outliers using the filter selection above the navigation tree. Some outliers will be repeatable, unknown events. It’s not worth making adjustments for these items since doing so will falsify and minimize real demand variation. Another excellent practice is to borrow history from similar items for new products. History adjustments do not carry upwards as you work with aggregations, since aggregations sum actual history, so you do not need to worry about double-counting demand. – Demand Works User Guide

Model Effectiveness

Aside from abuse, Michael Gilliland makes the point that outlier removal overestimates the predictability of any forecast model.

An outlier is an observation that is well outside the range of expected values, such as extremely high or extremely low sales in a given week. While it is convenient to ignore outliers in the model construction process, this can lead to unwarranted overconfidence in the model’s ability to forecast. Outliers tell us something that shouldn’t be ignored — that behavior we are trying to forecast is more erratic than we would like it to be. These kinds of extreme data points have happened in the past, and we are foolish to think extreme data points won’t happen again in the future.

This leads into much of the work of Nassim Nicholas Taleb, author of the book The Black Swan, where he describes that strong natural human tendency to ignore (or remove as outliers) the impact of highly improbable events. He states that companies that perform financial forecasting remove improbable events from their forecast models, and this reduces the forecast-ability of the model, but also makes the financial system much less robust. Taleb’s approach is actually the opposite, which is described by this Amazon reviewer..

In one of the many humorous anecdotes that seem to comprise this entire book, Taleb recounts how he learned his extreme skepticism from his first boss, a French gentleman trader who insisted that he should not worry about the fluctuating values of economic indicators. (Indeed, Taleb proudly declares that, to this day, he remains blissfully ignorant of supposedly crucial “indicators” like housing starts and consumer spending. This is a shocking statement from a guy whose day job is managing a hedge fund.) Even if these “common knowledge” indicators are predictive of anything (dubious – see above), they are useless to you because everyone else is already accounting for them. They are “white swans,” or common sense. Regardless of their magnitude, white swans are basically irrelevant to the trader – they have already been impounded into the market. In this environment, one can only profitably concern oneself with those bets which others are systematically ignoring – bets on those highly unlikely, but highly consequential events that utterly defy the conventional wisdom. What Taleb ought to worry about, the Frenchman warned, was not the prospect of a quarter-percent rise in interest rates, but a plane hitting the World Trade Center!

Of course supply chain forecasting is not attempting to compete with other groups which are competing in order to make a profit, so the intent of the forecast is different. However, it is interesting that while the more standard approach is to remove outliers, Taleb’s approach is to keep them in, and to build a trading strategy which focuses on them.

Taleb goes on to say the following in this area:

First, it is an outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility. Second, it carries an extreme impact. Third in spite of its outliers status, human nature makes us concoct explanations for its occurrence after the fact, making it explainable and predictable.

Here he describes the fact that is a natural human inclination to remove outliers from our historical record.

Outlier Identification

Demand Works, as with a number of other forecasting applications, can identify outliers through a simple parameter which can be set high or low (as measured by the number of standard deviations from the series mean). However, this is only for reporting, the outliers are still counted for the actual forecast. The outlier can then be reported upon and receive attention from the planner.

Conclusion

Outlier removal has been greatly misused in the past and will likely be greatly misused in the future. So called one time events are part of the historical record, and the past is not for any reason less stable than the future. One should be very careful when both removing outliers from forecasts, but particularly when evaluating research where outliers have been removed, as this is a telltale sign that the research has been manipulated.

References

“The Business Forecasting Deal: Exposing Myths, Eliminating Bad Practices, Providing Practical Solutions,” Michael Gilliland, (Wiley and SAS Business Series), 2010

“The Black Swan: The Impact of the Highly Improbable,” Nassim Nicholas Taleb, Random House, 2007

Demand Works Smoothie User Guide

http://www.amazon.com/Black-Swan-Impact-Highly-Improbable/dp/1400063515

{ 1 comment… read it below or add one }

Tom Reilly December 23, 2010 at 1:38 pm

The implications of not adjusting for outliers has been well documented in many Statistical Journals. I will point you to the great work of Ruey Tsay here http://www.unc.edu/~jbhill/tsay.pdf

Your discussion of financial data and Nutrasweet is understood, but when it comes to supply chain, adjusting for outliers is very critical. And it is equally important how you identify them!

As you point out, most systems using a simple approach of calling an outlier when it is 2/3 standard deviations outside and then asking you how many iterations of removing and adjusting that you should perform. This approach is very simple and misses other important outliers that distort the model and forecast. You need to identify the outliers while you are building the model AND a final check of 2/3 std deviations at the end of the process. A fun example, we like to torture our competition with is the series 1,9,1,9,1,9,1,5. Where is the outlier? Well we can see that the 5 is unusual and we could call this an inlier as it is “too good to be true” and at the mean. Simple outlier schemes completely miss this outlier and the forecast suffers. The 1,9 example is contrived, but is an example that does happen in datasets we see all the time.

Reply

Leave a Comment

{ 4 trackbacks }

Previous post:

Next post: