Konubinix' opinionated web of thoughts

Prime Number in a Hashing Function

Fleeting

data structures - Why is it best to use a prime number as a mod in a hashing function? - Computer Science Stack Exchange

Consider the set of keys K={0,1,…,100}K={0,1,…,100} and a hash table where the number of buckets is m=12m=12. Since 33 is a factor of 1212, the keys that are multiples of 33 will be hashed to buckets that are multiples of 33:

Keys {0,12,24,36,…}{0,12,24,36,…} will be hashed to bucket 00. Keys {3,15,27,39,…}{3,15,27,39,…} will be hashed to bucket 33. Keys {6,18,30,42,…}{6,18,30,42,…} will be hashed to bucket 66. Keys {9,21,33,45,…}{9,21,33,45,…} will be hashed to bucket 99. If KK is uniformly distributed (i.e., every key in KK is equally likely to occur), then the choice of mm is not so critical. But, what happens if KK is not uniformly distributed? Imagine that the keys that are most likely to occur are the multiples of 33. In this case, all of the buckets that are not multiples of 33 will be empty with high probability (which is really bad in terms of hash table performance).

This situation is more common that it may seem. Imagine, for instance, that you are keeping track of objects based on where they are stored in memory. If your computer’s word size is four bytes, then you will be hashing keys that are multiples of 44. Needless to say that choosing mm to be a multiple of 44 would be a terrible choice: you would have 3m/43m/4 buckets completely empty, and all of your keys colliding in the remaining m/4m/4 buckets.

you want the number you are multiplying by and the number of buckets you are inserting into to have orthogonal prime factorizations.

While it’s essential to understand the roles that hashCode() and equals() methods play, we don’t have to implement them from scratch every time, as most IDEs can generate custom hashCode() and equals() implementations and since Java 7, we got an Objects.hash() utility method for comfortable hashing:

And Eclipse produces this one: @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((email = null) ? 0 : email.hashCode()); result = prime * result + (int) (id ^ (id >>> 32)); result = prime * result + ((name = null) ? 0 : name.hashCode()); return result; } In addition to the ab