Back to CS 101

So, you read Alex’ post on applying the hashing trick to matrix factorization or our paper on the subject and thought you’d just go ahead and implement it. And instead of computing two hashes, one for the Rademacher function ( [latex]\sigma[/latex] in the paper) and one for the index in the storage array, you compute just one and use the sign of the hash as [latex]\sigma[/latex] and the absolute value for the index. Which looks something like this in Java:

double[] storage; // The underlying storage array
int hash(int index){ // The hash function used
  [...]
}
double get(int index){
  int realIndex = hash(index);
  return Math.signum((double)realIndex) * storage[Math.abs(realIndex) % storage.length];
}

Looks innocent enough, if a bit wordy. However, once every 2^32 times, it will go wrong, horribly. And that is when the hash function returns the value -2^31. In this case the Math.abs() call returns -2^31, as there is no representation of 2^31 in a signed 32Bit integer, which is documented behavior. But who reads documentation anyway? ;-) Thankfully, Andrew reminded me when the ArrayIndexOutOfBoundsException hit complaining about a negative index.

Luckily, a simple fix is available:

[...]
double get(int index){
  int realIndex = hash(index);
  return Math.signum((double)realIndex) * storage[Math.abs(realIndex % storage.length)];
}
[...]

This assumes, of course, that the modulo operator works for negative operants, which, thankfully, it does. Also, if storage.length is bigger than 2^31, this will again fail. However, this is impossible in Java as arrays are bound to sizes expressed as signed 32Bit integers, which limits them in size to 2^31-1.