Question:
In Java, the hash code for a String
object is computed as
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using integer arithmetic, where s[i]
is the i
-th character of the string, n
is the length of the string, and ^
indicates exponentiation.
Why is 31 used as a multiplier?
I understand that the multiplier should be a relatively large prime number. So why not 29, or 37, or even 97?
Answer:
Hash code is often used as a key for hash tables, called dictionaries. It's common for the maximum value of possible codes to be stored in 32 bits, so it makes sense to use the maximum multiple of 32 and the immediate lower prime is 31. Not that you need to use all the codes, but from that number you can derive the index plus appropriate according to the amount of buckets possible in that specific spread thus giving a good distribution.
According to comments, today it is considered that there are better (larger) numbers , but as far as I know the reason for the initial choice was this. A smaller number could generate code collisions much more easily. A bigger one is really better, but the gain difference isn't that big, a smaller one is much worse.
On some platforms a shift operation of certain numbers is cheap, on others it is not, in some cases there is optimization for some numbers, as in the case of 31 which can be used (it is a shift and a simple subtraction).
You can say that it's not a well thought out number, no in-depth assessment was made, something that has a sensational justification 🙂
A comparison was made on SOen . It seems that certain numbers are the same, but note that other observations need to be made, the analysis cannot be taken in isolation. Ali does not show other problems of each number.