Java 集合 HashMap 源码学习 (JDK-1.8)

发表于 2017-04-02 更新于 2023-01-27 分类于 java 阅读次数： Waline：本文字数： 12k 阅读时长 ≈ 11 分钟

小组开小讲堂，在小讲堂上听了HashMap这部分内容，决定看源码再巩固一下。顺便分享给大家。
我参考的是的是JDK 1.8 HashMap的源码

在这之前，先问大家几个问题

new一个HashMap的默认长度是多少
默认加载因子
HashMap的底层存储
为什么HashMap的大小必须是2的幂次方
key的Hash方法
hashCode碰撞时怎么处理
扩容时怎么处理
HashMap的最大长度

数组：采用一段连续的存储单元来存储数据。对于指定下标的查找，时间复杂度为 O(1)，但在数组中间以及头部插入数据时，需要复制移动后面的元素。

链表：一种在物理存储单元上非连续、非顺序的存储结构，数据元素的逻辑顺序是通过链表中的指针链接次序实现的。链表由一系列结点（链表中每一个元素）组成，结点可以在运行时动态生成。每个结点都包含“存储数据单元的数据域”和“存储下一个结点地址的指针域”这两个部分。由于链表不用必须按顺序存储，所以链表在插入的时候可以达到 O(1) 的复杂度，但查找一个结点或者访问特定编号的结点需要 O(n) 的时间。

哈希表：根据关键码值（Key value）直接进行访问的数据结构。通过把关键码值映射到表中一个位置来访问记录，以加快查找的速度。这个映射函数叫做哈希函数，存放记录的数组就叫做哈希表。

树：由 n（n≥1）个有限结点组成的一个具有层次关系的集合，就像是一棵倒挂的树。

哈希表将键的 Hash 值映射到内存地址，即根据键获取对应的值，并将其存储到内存地址。也就是说 HashMap 是根据键的 Hash 值来决定对应值的存储位置。通过这种索引方式，HashMap 获取数据的速度会非常快。

哈希表是怎么解决的呢？方式有很多，比如，开放定址法、再哈希函数法和链地址法。

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

i = (n - 1) & hash

(n - 1) & hash 是怎么设计的，这里的 n 代表哈希表的长度，哈希表习惯将长度设置为 2 的 n 次方，这样恰好可以保证 (n - 1) & hash 的计算得到的索引值总是位于 table 数组的索引之内。
因为数组是0bash的长度为16，实际使用的是 0-15，这样4位就够用了

例如：hash=15，n=16 时，结果为 15；hash=17，n=16 时，结果为 1。

二进制	十进制	十六进制
0000 0000	0	00
0000 0001	1	01
0000 0010	2	02
0000 0011	3	03
0000 0100	4	04
0000 0101	5	05
0000 0110	6	06
0000 0111	7	07
0000 1000	8	08
0000 1001	9	09
0000 1010	10	0A
0000 1011	11	0B
0000 1100	12	0C
0000 1101	13	0D
0000 1110	14	0E
0000 1111	15	0F

| 0001 0000 | 16 | 10 |

当 HashMap 中只存在数组，而数组中没有 Node 链表时，是 HashMap 查询数据性能最好的时候。一旦发生大量的哈希冲突，就会产生 Node 链表，这个时候每次查询元素都可能遍历 Node 链表，从而降低查询数据的性能。


/**
 * 默认加载因子0.75
 * The load factor used when none specified in constructor. 
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;


/**
 * 默认容量 16
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16


/**
 * 默认构造方法
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
	this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}


/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and the default load factor (0.75).
 *
 * @param  initialCapacity the initial capacity.
 * @throws IllegalArgumentException if the initial capacity is negative.
 */
public HashMap(int initialCapacity) {
	this(initialCapacity, DEFAULT_LOAD_FACTOR);
}


/**
 * Constructs an empty <tt>HashMap</tt> with the specified initial
 * capacity and load factor.
 *
 * @param  initialCapacity the initial capacity
 * @param  loadFactor      the load factor
 * @throws IllegalArgumentException if the initial capacity is negative
 *         or the load factor is nonpositive
 */
public HashMap(int initialCapacity, float loadFactor) {
	if (initialCapacity < 0)
		throw new IllegalArgumentException("Illegal initial capacity: " +
										   initialCapacity);
	if (initialCapacity > MAXIMUM_CAPACITY)
		initialCapacity = MAXIMUM_CAPACITY;
	if (loadFactor <= 0 || Float.isNaN(loadFactor))
		throw new IllegalArgumentException("Illegal load factor: " +
										   loadFactor);
	this.loadFactor = loadFactor;
	this.threshold = tableSizeFor(initialCapacity);
}

// 默认的初始容量是16 - 2的幂。
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

// 最大容量，必须是2的幂且小于2的30次方，传入容量过大将被这个值替换
// 1 << 30，也就是2的30次方  
static final int MAXIMUM_CAPACITY = 1 << 30;

// 
static final float DEFAULT_LOAD_FACTOR = 0.75f;

//
static final int TREEIFY_THRESHOLD = 8;

//
static final int UNTREEIFY_THRESHOLD = 6;

//
static final int MIN_TREEIFY_CAPACITY = 64;

一个静态内部类Node
static class Node<K,V> implements Map.Entry<K,V>

内部类KeySet
final class KeySet extends AbstractSet

内部类Values
final class Values extends AbstractCollection

内部类EntrySet
final class EntrySet extends AbstractSet<Map.Entry<K,V>>

抽象类HashIterator
abstract class HashIterator

内部类KeyIterator
final class KeyIterator extends HashIterator implements Iterator

final class ValueIterator extends HashIterator
implements Iterator

…

// 构造方法，传入初始容量和加载因子
// 方法会对初始容量进行校验，大于0，小于2的30次方
// 方法会对加载因子进行校验，大于0，并且部位NaN
public HashMap(int initialCapacity, float loadFactor)

// 构造方法，传入初始容量
// 同上
// 使用默认加载因子0.75
public HashMap(int initialCapacity)

// 构造方法
// 同上
// 使用默认初始容量16，默认加载因子0.75
public HashMap()

// 构造方法
// 传入map
// 使用默认加载因子0.75
public HashMap(Map<? extends K, ? extends V> m)

// 根据key计算hash值
static final int hash(Object key)

//
static Class<?> comparableClassFor(Object x)

//
static int compareComparables(Class<?> kc, Object k, Object x)

// Returns a power of two size for the given target capacity.
static final int tableSizeFor(int cap)

// Implements Map.putAll and Map constructor
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict)

// Returns the number of key-value mappings in this map.
public int size()

// Returns true if this map contains no key-value mappings.
public boolean isEmpty()

// Returns the value to which the specified key is mapped,
// or null if this map contains no mapping for the key.
public V get(Object key)

// Implements Map.get and related methods
final Node<K,V> getNode(int hash, Object key)

// Returns true if this map contains a mapping for the specified key.
public boolean containsKey(Object key)

// Associates the specified value with the specified key in this map.
// If the map previously contained a mapping for the key, the old
// value is replaced.
public V put(K key, V value)

// Implements Map.put and related methods
final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict)

/**
 * Basic hash bin node, used for most entries.  (See below for
 * TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
 */
static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;

    Node(int hash, K key, V value, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.value = value;
        this.next = next;
    }

    public final K getKey()        { return key; }
    public final V getValue()      { return value; }
    public final String toString() { return key + "=" + value; }

    public final int hashCode() {
        return Objects.hashCode(key) ^ Objects.hashCode(value);
    }

    public final V setValue(V newValue) {
        V oldValue = value;
        value = newValue;
        return oldValue;
    }

    public final boolean equals(Object o) {
        if (o == this)
            return true;
        if (o instanceof Map.Entry) {
            Map.Entry<?,?> e = (Map.Entry<?,?>)o;
            if (Objects.equals(key, e.getKey()) &&
                Objects.equals(value, e.getValue()))
                return true;
        }
        return false;
    }
}

Hashmap继承于AbstractMap，实现了Map、Cloneable、java.io.Serializable接口。
它的key、value都可以为null，映射不是有序的。　　　　
Hashmap不是同步的，如果想要线程安全的HashMap，可以通过Collections类的静态方法synchronizedMap获得线程安全的HashMap。
Map map = Collections.synchronizedMap(new HashMap());

HashMap 中两个影响其性能的重要参数：“初始容量” 和 “加载因子”。
容量：是哈希表中桶的数量，初始容量只是哈希表在创建时的容量
加载因子：是哈希表在其容量自动增加之前可以达到多满的一种尺度（默认0.75）。　
当哈希表中的条目数超出了加载因子与当前容量的乘积时，则要对该哈希表进行 rehash 操作（扩容，即重建内部数据结构，桶数Ｘ２）。
加载因子越大,填满的元素越多,好处是,空间利用率高了,但:冲突的机会加大了.反之,加载因子越小,填满的元素越少,好处是:冲突的机会减小了,但:空间浪费多了.

HashMap数据结构

HashMap数据结构
Hashmap本质是数组加链表。
通过key的hashCode来计算hash值的，只要hashCode相同，计算出来的hash值就一样，然后再计算出数组下标，
如果多个key对应到同一个下标，就用链表串起来，新插入的在前面。

长度必须是2的幂次方

这里的h是”int hash = hash(key.hashCode());”, 也就是根据key的hashCode再次进行一次hash操作计算出来的 .
length是Entry数组的长度 .
一般我们利用hash码, 计算出在一个数组的索引, 常用方式是”h % length”, 也就是求余的方式 .
可能是这种方式效率不高, SUN大师们发现, “当容量一定，是2^n时，h & (length - 1) == h % length” . 按位运算特别快 .

/** 
 * Returns index for hash code h. 
 */  
static int indexFor(int h, int length) {  
    return h & (length-1);  
}

对于length = 16, 对应二进制”1 0000”, length-1=”0 1111”
假设此时h = 17 .
(1) 使用”h % length”, 也就是”17 % 16”, 结果是1 .
(2) 使用”h & (length - 1)”, 也就是 “1 0001 & 0 1111”, 结果也是1 .
我们会发现, 因为”0 1111”低位都是”1”, 进行”&”操作, 就能成功保留”1 0001”对应的低位, 将高位的都丢弃, 低位是多少, 最后结果就是多少 .
刚好低位的范围是”0~15”, 刚好是长度为length=16的所有索引 .

线程安全

Map m = Collections.synchronizeMap(hashMap);

ConcurrentHashMap

(1.1) 哈希表提供的功能

新建哈希表、新增数据、修改数据、删除数据、查询数据

以Java里的HashMap为例 JDK8里HashMap api https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html

JDK8里HasMap源码 https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java

新建哈希表

/**
 * 创建一个HashMap使用指定的容量和加载因子
 *
 * @param  initialCapacity 初始容量
 * @param  loadFactor      加载因子
 * @throws IllegalArgumentException  
 */
public HashMap(int initialCapacity, float loadFactor) {
    // 省略部分代码
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

新增/修改


/**
 * 将指定的值与此映射中的指定键相关联。 如果映射以前包含键的映射，则会替换旧值。
 *
 * @param key 
 * @param value 
 * @return 
 */
public V put(K key, V value) {
    // 
    return putVal(hash(key), key, value, false, true);
}

删除

/**
 * 从该映射中删除指定键的映射（如果存在）。
 *
 * @param  key 键
 * @return 
 */
public V remove(Object key) {
    Node<K,V> e;
    return (e = removeNode(hash(key), key, null, false, true)) == null ?
        null : e.value;
}

查询


/**
 * 返回key对应的value
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

(1.2) 哈希函数

/**
 * 计算 key.hashCode() 并将散列的高位散布 (XOR) 到低位。 
 * 因为该表使用二次方掩码，所以仅在当前掩码以上的位上有所不同的散列集将始终发生冲突。 
 *（在已知的例子中有一组 Float 键在小表中保存连续的整数。）
 * 所以我们应用一个转换来向下传播较高位的影响。 在位扩展的速度、效用和质量之间存在权衡。 
 * 因为许多常见的哈希集已经合理分布（因此不会从传播中受益），并且因为我们使用树来处理 bins 中的大量冲突，
 * 所以我们只是以最便宜的方式对一些移位的位进行 XOR 以减少系统损失， 以及合并最高位的影响，
 * 否则由于表边界而永远不会在索引计算中使用这些位。
 */
static final int hash(Object key) {
    int h;
    // h = key.hashCode() 为第一步 取hashCode值
    // h ^ (h >>> 16)  为第二步 高位参与运算
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

取key的hashCode值、高位运算

(1.3) 哈希冲突

在Java的HashMap里,哈希冲突时采用的是链地址法，冲突的节点通过链表或红黑树存放。

/**
 * 实现Map.put和相关方法
 *
 * @param hash key的哈希
 * @param key 键
 * @param value 值
 * @param onlyIfAbsent 如果为true，则不更改现有值
 * @param evict 如果为false，则表处于创建模式。
 * @return 上一个值 或者 null(如果没有的时候)
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 步骤①：tab为空则创建
    if ((tab = table) == null || (n = tab.length) == 0)
        n = (tab = resize()).length;
    // 步骤②：计算index，并对null做处理     
    if ((p = tab[i = (n - 1) & hash]) == null)
        tab[i] = newNode(hash, key, value, null);
    else { // value不为null
        Node<K,V> e; K k;
        // 步骤③：节点key存在，直接覆盖value
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k))))
            e = p;
        // 步骤④：判断该链为红黑树    
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {
            // 步骤⑤：该节点为链表
            for (int binCount = 0; ; ++binCount) {
                // p.next==null 代表是尾结点
                if ((e = p.next) == null) {
                    // 把新建的节点放到p.next 
                    p.next = newNode(hash, key, value, null);
                    //链表长度大于8转换为红黑树进行处理
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        treeifyBin(tab, hash);
                    break;
                }
                // key已经存在直接覆盖value
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    // 步骤⑥：超过当前最大容量 就扩容
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

(1.4) 扩容-rehash


/**
 * 初始化或加倍表大小。 如果为 null，则根据字段阈值中保留的初始容量目标进行分配。
 * 否则，由于我们使用的是 2 次幂扩展，因此每个 bin 中的元素必须保持在同一索引上，或者在新表中以 2 的偏移量移动。
 *
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = 0;
    if (oldCap > 0) {
        if (oldCap >= MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << 1; // double threshold
    }
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - 1)] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

References

[1] 深入解析HashMap、HashTable
[2] HashMap和Hashtable的区别
[3] JAVA中HashMap和Hashtable区别
[4] Differences between HashMap and Hashtable?
[5] HashMap的工作原理
[6] Java集合之HashMap源码解析 | gyl-coder
[7] Java集合之HashMap源码解析
[8] HashMap的长度为什么设置为2的n次方
[8] HashMap深度解析(一)
[9] HashMap深度解析(二)
[10] Java集合框架源码解析之HashMap