Scala Notes Number Madness
This is another article in the scala-notes series. This one deals with what I call number madness.
What!?
Numbers are used everywhere in code, and often the type used is something like Int
(Scala) or Integer
/int
(Java). These simple types are familiar to those of us who used languages like C but should not be used without considering their inherent limitations. Let’s imagine an example web application that during its normal operation records interesting events…
Imagine a field used to represent the key of a database table named INTERESTING_EVENT
, specifically an automatically incrementing surrogate key, as is typical. (This is not the best example, in part because one would rarely need perform arithmetic on a key value, but it is simple to explain.) Let’s say that on average, only twenty INTERESTING_EVENT
records are populated per second.
Is an Int
or Integer
safe? Let’s get some perspective…
- 365 days per year
- 8760 hours per year
- 525,600 minutes per year
- 31,536,000 seconds per year
Given a liner expansion of the INTERESTING_EVENT
sequence, by twenty each second, the value would grow by 630,720,000 each year.
- 630,720,000 events per year
In four years that would grow to 2,524,608,000.
- 2,522,880,000 events in four years
The MAX_VALUE
for java.lang.Integer
is 2,147,483,647.
The example is contrived and unlikely but also recall that automatically incremented keys do not always grow linearly, and twenty events per second is conservative for a web server… (Insert your own web scale humor here.) But even with those constraints the key grows beyond the capacity of java.lang.Integer
in less that four years. Four years are enough time to forget about the capacity constraint built into the example application.
You might forget how these simple number types work and think:
No problem, statistics or other processes are running, constantly computing metrics against
Events
and some computations use the key value… Surely an exception will be thrown as soon as the Integer overflows.
This is where you would be wrong.
Division will throw an ArithmeticException
on division by zero, but a sum, for example, will quietly continue on an overflow. Let me repeat that… The sum will return an incorrect value on overflow, but it will not throw a runtime exception! This is serious problem. The application will quietly continue to run, but the mathematical operations on the value will be incorrect. This should be a real “What!?” moment.
Two’s complement
The JVM
stores an int
as a two’s complement binary (the core of java.lang.Integer
and scala.Int
), and thus it is often referred to as a two’s complement type.
Specifically, the JVM uses a 32-bit signed two’s complement integer for int
. The most significant bit is used to designate the sign of the number. The max value for int
is 2,147,483,647, and would thus be represented1 like 0111 1111 1111 1111 1111 1111 1111 1111
.
Simple sum
First lets simply add 1 to 2,147,483,646.
The result is what we expect, 0111 1111 1111 1111 1111 1111 1111 1111
, or 2,147,483,647 in decimal.
Negative values
The JVM int
is signed, that is to say both positive and negative values can be stored as an int
.
Using the MSB (most significant bit) to designate sign makes it possible to store negative values using the same placeholders and arithmetic. A negative is represented with the MSB set to 1
. To convert -2,147,483,647 to a two’s complement binary for int
, we must find the complement of each bit then add one to the result.
Find the complement of each bit of
0111 1111 1111 1111 1111 1111 1111 1111
.
1000 0000 0000 0000 0000 0000 0000 0000
Add one
The minimum value would be the value with the greatest magnitude and the opposite sign. The 32 bit, negative, two’s complement number with the greatest magnitude would be where the sign bit is set to 1
and all the other bits are set to 0
, i.e.. 1000 0000 0000 0000 0000 0000 0000 0000
. This means the min value2 is -2,147,483,648 in decimal.
Bad arithmetic without error
Let’s see what happens when we overflow by adding one to 2,147,483,647.
Wait! We have seen this one before, it is -2,147,483,648! So in this case the result is mathematically incorrect, but a valid int
!
You can try out Scala code snips in the online REPL here, and Java in the online REPL here. The following return -2147483648
.
- Scala
Int.MaxValue + 1
- Java
Integer.MAX_VALUE + 1
BigInteger (java.math.BigInteger
) to the rescue?
Clearly when arithmetic is needed, or when it is possible to overflow a field, one should use something safe like BigInteger over its primitive counterpart. Since one can never know for certain how a field will be used, selecting a safe type should be the default.
If one can be sure that arithmetic will never be performed on a field, and benchmarking shows that using a primitive is needed, then maybe it is worth the risk.
Primitive Type
- Pros
- Simple quick arithmetic.
- Fixed memory allocation size.
- Symbol operators. (
+ - * /
) - Cons
- Arithmetic can fail without error!
BigInteger (java.math.BigInteger
)
- Pros
- Correct arithmetic
- Cons
- Can not use symbol operators; must use
add
,subtract
,multiply
, anddivide
methods. - Arithmetic could be slower.
- Higher memory allocation requirements because more information is stored and capacity is dynamic.
BigInt (scala.math.BigInt
)
If you live in the Scala world you have better options, such as scala.math.BigInt
. It can be thought of a Scala’s BigInteger
. In Scala, characters such as + - * /
can be used as method names, so this makes using scala.math.BigInt
a pleasant experience.
Spire
Finally, when talking about handling numbers in Scala, we need to talk about spire. Spire makes working with numbers a beautiful experience.
//spire example
object Inspired extends App {
import spire.syntax.literals.si._
val y = big"2 147 483 647"
println(s"y, using spire string interpolation = ${y + 1}")
}
Look at how clear and simple it is to create a BigInt
using spire string interpolation. Using spaces makes it easier to identify the place values for each digit and results in source that is simpler to read. Spire supports additional number types, type classes and more. Let’s take SafeLong for example. It works as a nice replacement for BigInt
. While it provides accurate arithmetic, independant of value, it performs better than BigInt
for values inside the java.lang.Long
range.3
Conclusion
A two’s complement type like int
should be considered an optimization choice for Scala and Java software development, and not a default. Its performance comes at a cost and that cost is silent failure. The default choice for a whole number field should be something like BigInt
or SafeLong
because they ensure correct arithmetic without regard to the size of the operands. This is made far more pleasant in Scala, especially when using a library like spire.