Page 36 - Developer
P. 36

INNER PRODUCT // AUTHORDAwsON
                 INNER PR    ODUCT     //  b R UCE




          Floating-Point Misses





          Dealing with the Dangers of floating-point math


          floating-point numbers are ubiquitous in games because they are a convenient way to handle the necessary math. Floats
          gracefully handle overflow and underflow, and they have enough range and precision that we can sometimes forget how limited and
          dangerous they are. Developers naturally want floats to be the same as the real numbers we studied in school, and floats maintain
          the illusion quite well. But sometimes the illusion comes crashing down—and brings your game with it. In this article, we’ll take a
          good long look at how floating-point math can make your life miserable—and some strategies for minimizing that misery.
          32 bits, reinterPreted                                                     is accurate to about one part
          » If you’re reading this, you should                                       in a thousand. If you measure
          already have a solid understanding                                         someone’s height directly, then
          of the float format, so we’ll keep                                         your answer will be accurate to
          the overview brief: A standard IEEE                                        within about two millimeters. Now
          float consists of a sign bit, 8-bit                                        imagine that instead you decided
          exponent, and 24-bit mantissa (see                                         to stand the person on top of the
          Figure 1). Yes, this adds up to 33                                         Empire State Building, measure
          bits, but all that magically fits into a                                   their height from the ground,
          32-bit package, because the leading                                        measure the height of the building,
          one of the mantissa is, for numbers                                        and then subtract the two numbers.
          above FLT_MIN, implied instead of                                          Instead of measuring 1.80 m
          being explicitly stored.                                                   directly, you are now measuring
            A handy feature of the                                                   382.8 m and then subtracting
          floating-point format is that                                              381.0 m, with both measurements
          if you increment the 32-bit                                                having an error of about .38 m. Your
          representation of a float, then                                            answer will be the person’s height,
          you move to the next float away                                            plus or minus about .76 m. The
          from zero. Adjacent floats (of the                                         loss of most of the top digits when
          same sign) have adjacent integer                                           subtracting similar numbers is
          representations. Incrementing the                                          known as catastrophic cancellation.
          integer representation normally just
          increments the mantissa, but if the                                        addition oF dissiMilar
          mantissa is all ones, incrementing                                         » Now imagine that you have two
          the 32-bit integer instead overflows                                       numbers with dissimilar ranges—
          the mantissa to zero and increments                                        perhaps the height of the Empire
          the exponent field. Due to the magic                                       State Building and the height
          of the implied leading one, this                                           of a person—and both of these
          still gives you the next float. This                                       numbers are accurate to about one
          technique works all the way from                                           part in a thousand. If you add them
          zero to infinity, and from negative   a float is between one part in 8   Specifically, subtraction of numbers   and store the result in a number
          zero to negative infinity, and we’ll   million and one part in 16 million.  with similar magnitude (or,   that is accurate to one part in a
          discuss its application later.                    equivalently, addition of opposite   thousand, then most of the digits of
            The range of a float is generally   don’t torMent    signed numbers with similar   the small number will be lost. If you
          plenty large enough, but the   the Math gods      magnitude) loses a lot of precision,   do this repeatedly, the cumulative
          precision is a bit weak. A 24-bit   » 24 bits of precision is enough   and so does adding or subtracting a   loss can grow arbitrarily big.
          mantissa means (for instance)   for a lot of purposes, but if you’re   small number to a large number.
          that for numbers above about 16   not careful you may inadvertently        it’s the sig-Figs, baby
          million, a float actually has less   throw away most of that precision.   subtraction oF siMilar  » It surprises a lot of people
          precision than a 32-bit integer. The   It turns out that subtraction and   » Imagine that you want to   that multiplication and division
          rule of thumb is that, over virtually   addition are the most dangerous   measure someone’s height with   aren’t the source of more math
          the entire range, the precision of   operations for losing precision.   a very long tape measure that   errors—probably because those





                                                                                  Figure 1: the anatomy of a standard ieee float.
          34  gAmE DEvElOPER   |   OCTObER 2012
   31   32   33   34   35   36   37   38   39   40   41