I am wondering what would be the time complexity on Java HashMap resizing when the load factor exceeds the threshold ? As far as I understand for HashMap the table size is always power of 2 an even number, so whenever we resize the table we don't necessary need to rehash all the keys (correct me if i am wrong), all we need to do is to allocate additional spaces without and copy over all the entries from the old table (I am not quite sure how does JVM deal with that internally), correct ? Whereas for Hashtable since it uses a prime number as the table size, so we need to rehash all the entries whenever we re-size the table. So my question is does it still take O(n) linear time for resizing on HashMap ?
-
You could always just study the source for HashMap. :)Ted Hopp– Ted Hopp2013-01-10 05:25:38 +00:00Commented Jan 10, 2013 at 5:25
2 Answers
Does it still take
O(N)time for resizing aHashMap?
Basically, yes.
And a consequence is that an insertion operation that causes a resize will take O(N) time. But that happens on O(1/N) of all insertions, so (under certain assumptions) the average insertion time is O(1).
so could a good load factor affect this performance ? like better and faster than
O(N)?
Choice of load factor affects performance, but not complexity.
If we make normal assumptions about the hash function and key clustering, when the load factor is larger:
- the average hash chain length is longer, but still
O(1), - frequency of resizes reduces, but is still
O(1/N), - the cost of a resize remains about the same, and the complexity is still
O(N).
... so whenever we resize the table we don't necessary need to rehash all the keys (correct me if i am wrong.
Actually, you would need to rehash all of the keys. When you double the hash table size, the hash chains need to be split. To do this, you need to test which of two chains the hash value for every key maps to. (Indeed, you need to do the same if the hash table had an open organization too.)
However, in the current generation of HashMap implementations, the hashcode values are cached in the chained entry objects, so that the hashcode for a key doesn't ever need to be recomputed.
One comment mentioned the degenerate case where all keys hash to the same hashcode. That can happen either due to a poorly designed hash function, or a skewed distribution of keys.
This affects performance of lookup, insertion and other operations, but it does not affect either the cost or frequency of resizes.
13 Comments
O(1) but the worst case is O(N). But this isn't that strange. The same thing happens with StringBuffer.append, appending to an ArrayList and so on.When the table is resized, the entire contents of the original table must be copied to the new table, so it takes O(n) time to resize the table, where n is the number of elements in the original table. The amortized cost of any operation on a HashMap (assuming the uniform hashing assumption) is O(1), but yes, the worst case cost of a single insertion operation is O(n).