Saturday, June 13, 2020

Death by rounding

Round, round, round, round, I get around

Rounding errors are one of those basic things that every technical person thinks they're on top of and won't happen to them, but the problem is, it can and does happen to good people, sometimes with horrendous consequences. In this blog post, I'm going to look at rounding errors, show you why they can creep in, and provide some guidelines you should follow to keep you and your employer safe. Let's start with some real-life cases of rounding problems.


(Rounding requires a lot of effort. Image credit: Wikimedia Commons. License: Public Domain)

Rounding errors in the real world

The wrong rounding method

In 1992, there was a state-level election in Schleswig-Holstein in Germany. The law stated that every party that received 5% or more of the vote got a seat, but there were no seats for parties with less than 5%. The software that calculated results rounded the results up (ceil) instead of rounding the results down (floor) as required by law. The Green party received 4.97% of the vote, which was rounded up to 5.0%, so it appeared the Green party had won a seat. The bug was discovered relatively quickly, and the seat was reallocated to the Social Democrats who gained a one-seat majority because of it [Link]. 

Cumulative rounding

The more serious issue is cumulative rounding errors in real-time systems. Here a very small error becomes very important when it's repeatedly or cumulatively added.

The Vancouver Stock Exchange set up a new index in January 1982, with a value set to 1,000. The index was updated with each trade, but the index was rounded down to three decimal places (truncated) instead of rounding to the nearest decimal place. The index was calculated thousands of times a day, so the error was cumulative. Over time, the error built up from something not noticeable to something very noticeable indeed. The exchange had to correct the error; on Friday November 25th, 1983 the exchange closed at 524.811, the rounding error was fixed, and when the exchange reopened, the index was 1098.892 - the difference being solely due to the rounding error bug fix [Link].

The most famous case of cumulative rounding errors is the Patriot missile problem in Dharan in 1991. A Patriot missile failed to intercept a Scud missile, which went on to kill 28 people and injured a further 98. The problem came from the effects of a cumulative rounding error. The Patriot system updated every 0.1s, but 0.1 can't be represented exactly in a fixed point system, there's rounding, which in this case was rounding down. The processors used by the Patriot system were old 24-bit systems that truncated the 0.1 decimal representation. Over time, the truncation error built up, resulting in the Patriot missile incorrectly responding to sensor data and missing the Scud missile [Link].  

Theoretical explanation of rounding errors

Cumulative errors

Fairly obviously, cumulative errors are a sum:


E = ∑e

where E is the cumulative error and e is the individual error. In the Vancouver Stock Exchange example, the mean individual rounding error when rounding to three decimal places was 0.0005. From Wikipedia, there were about 3,000 transactions per day, and the period from January 1st 1982 when the index started to November 25th, 1983 when the index was fixed was about 473 working days. This gives an expected cumulative error of about 710, which is in the ballpark of what actually happened. 

Of course, if the individual error can be positive or negative, this can make the problem better or worse. If the error is distributed evenly around zero, then the cumulative error should be zero, so things should be OK in the long run. But even a slight bias will eventually result in a significant cumulative error - regrettably, one that might take a long time to show up.

Although the formula above seems trivial, the point is, it is possible to calculate the cumulative effect of rounding errors.

Combining errors

When we combine numbers, errors can really hurt depending on what the combination is. Let's start with a simple example, if:


z = x - y 

and:
 
sis the standard error in z
sis the standard error in x
sis the standard error in y

then

sz  = [s2x + s2y]1/2

If x and y are numerically close to one another, errors can quickly become very significant. My first large project involved calculating quantum states, which included a formula like z = x - y. Fortunately, the rounding was correct and not truncated, but the combination of machine precision errors and the formulae above made it very difficult to get a reliable result. We needed the full precision of the computer system and we had to check the library code our algorithms used to make sure rounding errors were correctly dealt with. We were fortunate in that the results of rounding errors were obvious in our calculations, but you might not be so fortunate.


Ratios are more complex, let's define:

z = x/y

with the s values defined as before, then:

sz /z = [(sx/x)2 + (sy/y)2]0.5

This suffers from the same problem as before, under certain conditions, the error can become very significant very quickly. In a system like the Patriot missile, sensor readings are used in some very complex equations. Rounding errors can combine to become very important.

The takeaway is very easy to state: if you're combining numbers using a ratio or subtracting them, rounding (or other errors) can hurt you very badly very quickly.

Insidious rounding errors

Cumulative rounding errors and the wrong type of rounding are widely discussed on the internet, but I've seen two other forms of rounding that have caught people out. They're hard to spot but can be damaging.

Rounding in the wrong places - following general advice too closely

Many technical degrees include some training on how to present errors and significant digits. For example, a quantity like 12.34567890 ∓ 0.12345678 is usually written 12.3 ∓ 0.1. We're told not to include more significant digits than the error analysis warrants. Unfortunately, this advice can lead you astray if you apply it unthinkingly.

Let's say we're taking two measurements:


x = 5.26 ∓0.14
y = 1.04 ∓0.12

following the rules of representing significant digits, this gives us

x = 5.3 ∓0.1
y = 1.0 ∓0.1

If :

z = x/y

then with the pre-rounded numbers:

z = 5.1 ∓ 0.6

but with the rounded numbers we have:

z = 5.3 ∓ 0.5

Whoops! This is a big difference. The problem occurred because we applied the advice unthinkingly. We rounded the numbers prematurely; in calculations, we should have kept the full precision and only shown rounded numbers for display to users.

The advice is simple: preserve full precision in calculations and reserve rounding for numbers shown to users.

Spreadsheet data

Spreadsheets are incredible sources of errors and bugs. One of the insidious things spreadsheets do is round numbers, which can result in numbers appearing not to add up. 

Let's have a look at an example. The left of the table shows numbers before rounding. The right of the table shows numbers with rounding (suppressing the decimal places). The numbers on the right don't add up because of rounding (they should sum to 1206). 


  No round   Round
 Jan121.4   Jan  121
 Feb 251.4  Feb 251
 Mar 311.4  Mar 311
 Apr 291.4  Apr 291
 May 141.4  May 141
 Jun 91.4  Jun 91
 TOTAL 1208.4  TOTAL 1208

An insidious problem occurs rounded when numbers are copied from spreadsheets and used in calculations - which is a manifestation of the premature rounding problem I discussed earlier.

1.999... = 2, why 2 != 2, and machine precision

Although it's not strictly a rounding error, I do have to talk about the fact that 1.999... = 2. This result often surprises people, but it's an easy thing to prove. Unfortunately, on machines with finite precision, 1.9999... == 2 will give you False! Just because it's mathematically true, doesn't mean it's true on your system.  

I've seen a handful of cases when two numbers that ought to be the same fail an equality test, the equivalent of 2 == 2 evaluating to False. One of the numbers has been calculated through a repeated calculation and machine precision errors propagate, the other number has been calculated directly. Here's a fun example from Python 3:

1 == (1/7) + (1/7) + (1/7) + (1/7) + (1/7) + (1/7) + (1/7)

evaluates to False!

To get round this problem, I've seen programmers do True/False difference evaluations like this:

abs(a - b) <= machine_precision

The machine precision constant is usually called epsilon.

What to watch for

Cumulative errors in fixed-point systems

The Patriot missile case makes the point nicely: if you're using sensor data in a system using fixed-point arithmetic, or indeed in any computer system, be very careful how your system rounds its inputs. Bear in mind, the rounding might be done in an ADC (analog-to-digital converter) beyond your control - in which case, you need to know how it converts data. If you're doing the rounding, you might need to use some form of dithering.

Default rounding and rounding methods

There are several different rounding methods you can use; your choice should be a deliberate one and you should know their behavior. For example, in Python, you have:

  • floor
  • ceil
  • round - which uses banker's rounding not the school textbook form of rounding and was changed from Python 2 to Python 3.

You should be aware of the properties of each of these rounding methods. If you wanted to avoid the Vancouver Stock Exchange problem, what form of rounding would you choose and why? Are you sure?

A more subtle form of rounding can occur when you mix integers and floating-point numbers in calculations. Depending on your system and the language you use, 7.5/2 can give different answers. I've seen some very subtle bugs involving hidden type conversion, so be careful.

Premature rounding

You were taught to only present numbers to an appropriate numbers of decimal places, but that was only for presentation. For calculations, use the full precision available.

Spreadsheets

Be extremely careful copying numbers from spreadsheets, the numbers may have been rounded and you may need to look closer to get extra digits of precision.

Closing thoughts

Rounding seems like a simple problem that happens to other people, but it can happen to you and it can have serious consequences. Take some time to understand the properties of the system and be especially careful if you're doing cumulative calculations, mixed floating-point and integer calculation, or if you're using a rounding function.

No comments:

Post a Comment