This is another article in the scala-notes series. This one deals with what I call number madness.
Numbers are used everywhere in code, and often the type used is something like
Int (Scala) or
int (Java). These simple
types are familiar to those of us who used languages like C but should not be used without considering
their inherent limitations.
Lets imagine an example web application that during its normal operation records interesting events…
Imagine a field used to represent the key of a database table named
INTERESTING_EVENT, specifically an automatically incrementing surrogate key,
as is typical. (This is not the best example, in part because one would rarely need perform arithmetic on a key value, but it is simple to explain.)
Lets say that on average, only twenty
INTERESTING_EVENT records are populated per second.
Integer safe? Lets get some perspective…
- 365 days per year
- $= 365$
- 8760 hours per year
- $= 24 \cdot 365$
- 525,600 minutes per year
- $= 60 \cdot 8760$
- 31,536,000 seconds per year
- $= 60 \cdot 525,600$
Given a liner expansion of the
INTERESTING_EVENT sequence, by twenty each second, the value would grow by 630,720,000 each year.
- 630,720,000 events per year
- $= 31,536,000 \cdot 20$
In four years that would grow to 2,524,608,000.
- 2,522,880,000 events in four years
- $= 630,720,000 \cdot 4$
java.lang.Integer is 2,147,483,647.
The example is contrived and unlikely but also recall that automatically incremented keys do not always grow linearly, and twenty events per second is conservative for a web server… (Insert your own web scale humor here.)
But even with those constraints the key grows beyond the capacity of
java.lang.Integer in less that four years. Four years is enough time to forget about the capacity constraint built into the example application.
You might forget how these simple number types work and think:
No problem, statistics or other processes are running, constantly computing metrics against
Eventsand some computations use the key value… Surely an exception will be thrown as soon as the Integer overflows.
This is where you would be wrong.
Division will throw an
ArithmeticException on division by zero, but a sum, for example, will quietly continue on an overflow. Let me repeat that… The sum will return an incorrect value on overflow,
but it will not throw a runtime exception! This is serious problem. The application will quietly continue to run, but the mathematical operations on the value will be incorrect. This should be a real
JVM stores an
int as a two’s complement binary (the core of
scala.Int), and thus it is often referred to as a two’s complement type.
Specifically, the JVM uses a 32-bit signed two’s complement integer for
int. The most significant bit is used to designate the sign of the number.
The max value for
int is 2,147,483,647, and would thus be represented1 like
0111 1111 1111 1111 1111 1111 1111 1111.
First lets simply add 1 to 2,147,483,646.
The result is what we expect,
0111 1111 1111 1111 1111 1111 1111 1111, or 2,147,483,647 in decimal.
int is signed, that is to say both positive and negative values can be stored as an
Using the MSB (most significant bit) to designate sign makes it possible to store negative values using the same placeholders and arithmetic. A negative is represented
with the MSB set to
1. To convert -2,147,483,647 to a two’s complement binary for
int, we must find the complement of each bit then add one to the result.
Find the complement of each bit of
0111 1111 1111 1111 1111 1111 1111 1111.
1000 0000 0000 0000 0000 0000 0000 0000
The minimum value would be the value with the greatest magnitude and the opposite sign. The 32 bit, negative, two’s complement number with the greatest magnitude
would be where the sign bit is set to
1 and all the other bits are set to
1000 0000 0000 0000 0000 0000 0000 0000. This means the min value2 is -2,147,483,648 in decimal.
Bad arithmetic without error
Lets see what happens when we overflow by adding one to 2,147,483,647.
Wait! We have seen this one before, it is -2,147,483,648! So in this case the result is mathematically incorrect, but a valid
Int.MaxValue + 1
Integer.MAX_VALUE + 1
java.math.BigInteger) to the rescue?
Clearly when arithmetic is needed, or when it is possible to overflow a field, one should use something safe like BigInteger over its primitive counterpart. Since one can never know for certain how a field will be used, selecting a safe type should be the default.
If one can be sure that arithmetic will never be performed on a field, and benchmarking shows that using a primitive is needed, then maybe it is worth the risk.
- Simple quick arithmetic.
- Fixed memory allocation size.
- Symbol operators. (
+ - * /)
- Arithmetic can fail without error!
- Correct arithmetic
- Can not use symbol operators; must use
- Arithmetic could be slower.
- Higher memory allocation requirements because more information is stored and capacity is dynamic.
If you live in the Scala world you have better options, such as
scala.math.BigInt. It can be thought of a Scala’s
BigInteger. In Scala, characters such
+ - * / can be used as method names, so this makes using
scala.math.BigInt a pleasant experience.
Finally, when talking about handling numbers in Scala, we need to talk about spire. Spire makes working with numbers a beautiful experience.
1 2 3 4 5
Look at how clear and simple it is to create a
BigInt using spire string interpolation.
Using spaces makes it easier to identify the place values for each digit and results in source that is simpler to read.
Spire supports additional number types, type classes and more. Lets take SafeLong for example.
It works as a nice replacement for
BigInt. While it provides accurate arithmetic, independant of value, it performs better than
BigInt for values inside the
A two’s complement type like
int should be considered an optimization choice for Scala and Java software development, and not a default.
Its performance comes at a cost and that cost is silent failure.
The default choice for a whole number field should be something like
SafeLong because they ensure correct arithmetic without regard to the
size of the operands. This is made far more pleasant in Scala, especially when using a library like spire.