In C and C++, the unsigned modifier can be added to a integer variable declaration. It tells the compiler to treat the value of the variable as an unsigned value in arithmetic operations. Unsigned arithmetic is typically used:
When an integer is signed, one of its bits becomes the sign bit, meaning that the maximum magnitude of the number is halved. (So an unsigned 32-bit int can store up to 2^{32}-1, whereas its signed counterpart has a maximum positive value of 2^{31}-1.)
In Java, all integer types are signed (except char). Although a questionable design, even bytes are signed in Java! So what do we do if we want to treat a value as unsigned in Java? In most typical cases in which unsigned values are used, it actually turns out not to be too difficult to get the same result in Java.
It's important to remember that the unsigned keyword affects the interpretation, not the representation of a number. In other words, in cases where we aren't interpreting a value arithmetically— so-called bitwise operations such as AND, OR, XOR— it makes essentially no difference whether a value is marked as "signed" or "unsigned".
An important exception is the right shift operator, represented by two "greater than" symbols: >>. In both C/C++ and Java, this operator performs sign extension: that is, as well as shifting the bits of the number one place to the right, it preserves the sign. Specifically, after performing the shift, it copies the sign bit (the leftmost bit) into the leftmost position.
Now, if we're treating an integer as unsigned, then we don't want to copy the sign bit, because it doesn't actually represent the sign! Instead, we want to leave it as zero. To achieve this, in Java, instead of writing >>, we write >>>. This variant of the shift is sometimes called a logical shift, and the previous variant— which takes account of the sign— an arithmetic shift. At the machine code level, most architectures in fact provide different instructions for the two shifts, and the C/C++ compiler chooses the appropriate one depending on whether we've declared the variable in question as unsigned. In Java, we must explicitly say which type we require.
As an example of unsigned bitwise operations in C/C++ vs Java, we'll look at the XORshift random number generator. Invented by Goerge Marsaglia (2003)^{1}, the function provides a fast means of generating medium-quality random numbers using only a single variable or register. Each pass generates a new random number using two left shifts and one right shift, plus three exclusive or (XOR) operations, all using unsigned arithmetic. In C/C++, we simply declare the variable as unsigned. In Java, we ignore the signedness for the two bitwise operations (XOR and left shift) where it is irrelevant, and explicitly use signless logical shift right:
C/C++:
unsigned int seed = ...; int randNumber() { seed ^= (seed << 1); seed ^= (seed >> 5); // 2 > symbols! seed ^= (seed << 9); } |
Java:
int seed = ...; int randNumber() { seed ^= (seed << 1); seed ^= (seed >>> 5); // 3 > symbols! seed ^= (seed << 9); } |
By arithmetical operations on unsigned integers, we mean cases where we want the upper bit to represent magnitude. Normally with a (signed) Java int, the result of the following would be a negative number, as we "roll over" past the largest positive integer that an int can store:
int n = 1 << 31; System.out.println("n was " + n); n *= 2; System.out.println("n is now " + n);
gives:
n was 1073741824 n is now -2147483648
The usual way of getting round this problem is simply to use a type with a larger size and then "chop" off the extra bits (set them to zero). For example:
To "chop off the extra bits", we need to AND with the bits that we are interested in. For example, if we want to end up with an unsigned byte (8 bits), we need to AND with the value 0xff (255 in decimal)— 11111111 in binary or in other words, "the first 8 bits set". So the following are essentially equivalent:
C/C++:
unsigned byte b = ...; b += 100; unsigned int v = ...; v *= 2; |
Java:
int b = ...; b = (b + 100) & 0xff; long v = ...; v = (v * 2) & 0xffffffff; |
When we eventually want to write the unsigned value, it is OK to simply cast to the appropriate size. For example:
// Example: write an unsigned int (stored in a long) // to a byte buffer. ByteBuffer bb = ... long unsignedInt = ... bb.putInt((int) unsignedInt);
64-bit unsigned arithmetic is tricker in Java, because there's no "next size up" to go to. This means that in some cases, we need to re-interpret the sign bit ourselves. On the next page, we look at unsigned arithmetic in Java in a bit more detail, focussing on the 64-bit case.
1. Marsaglia, G. (2003) Xorshift RNGs, Journal of Statistical Software 8(13). As discussed in the paper, various values and combinations of shifts can actually be used, each giving a full period of 2^{32}-1. The method is generally a good "medium quality" random number generator that passes various statistical tests. Clearly, in order not to get "stuck", this single-variable method can never be allowed to produce the same output value twice in succession (or zero). If range were not an issue, one could always take bits from the middle of the integer as the final output to randNumber(). The method also works with long values to give a greater range and a period of 2^{64}-1. If you are mathematically minded and interested in the statistical properties of this generator, see also Panneton, F. & L'Ecuyer, P. (2005) On the Xorshift Random Number Generators, ACM Transactions on Modeling and Computer Simulation, 15(4).
If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants. Follow @BitterCoffey
Editorial page content written by Neil Coffey. Copyright © Javamex UK 2021. All rights reserved.