【數據結構】(11) Map 和 Set

一、Map 和 Set 的簡介

1、Set 和 Map

? ? ? ? Map 和 Set 是集合類框架學習的最后一部分。Map 和 Set 都是接口，需要通過 TreeSet、HashSet 和 TreeMap、HashMap 實例化。注意，Set 實現了 Collection，Map 并沒有。

? ? ? ? Set 存放的是鍵（Key），必須唯一；而 Map 存放的是鍵值對（Key-Value），Value可以不唯一。

2、Set 和 Map 的應用場景

? ? ? ? 應用于需要搜索的場景。我們常用的搜索方法有：

遍歷。（大數據量時，效率低，O(N)）。
二分查找。（效率較高，O(logN)，但要求數據有序）。

? ? ? ? 以上兩種適用于靜態數據的搜索，比如，用二分查找搜索動態數據，每次更改數據值都會使數據重新排序，導致效率低下。動態數據的搜索場景適合用 Set 和 Map，一是效率高，最高達 O(logN) 或 O(1)；二是動態擴展性能良好，能自動調整內部結構，保持原有性質。

3、Map 的使用

3.1、Map.Entry<K, V>

? ? ? ? 內部類，相當于二叉樹中的 Node。

? ? ? ? 屬性：

? ? ? ? 提供的方法：獲取 key、value，設置 value，重寫的 equals、toString、hashCode。

3.2、常用方法

? ? ? ? 補充：HashMap 不是線程安全的，currentHashMap 是線程安全的。

4、Set 的使用

4.1、常用方法

4.2、其它

Set 的底層是 Map，TreeSet 中默認 value 是 Object 對象插入TreeMap。?

LinkedHashSet 是在 HashSet 的基礎上，維護一個雙向鏈表記錄元素的插入順序或訪問順序，使 HashSet 可以按照這個順序迭代。

二、二叉搜索樹

1、什么是二叉搜索樹

左子樹所有結點小于根結點。
右子樹所有結點大于根結點。
所有子樹都滿足以上條件。

2、二叉搜索樹的性能

? ? ? ? 如果搜索 key，從 root 開始判斷，比 root.val 小直接往左走，比 root.val 小直接往右走。對于完全二叉搜索樹來說，最差時間復雜度是樹高?O(logN)；對于極度不平衡的二叉搜索樹來說，最差時間復雜度是 O(N)，相當于遍歷了所有結點。

? ? ? ? 因此，我們希望二叉搜索樹盡量平衡。在源碼中，TreeMap 和 TreeSet 的底層是用紅黑樹實現的，紅黑樹能通過各種操作使二叉搜索樹平衡（高度差≤1）。

3、實現簡單的二叉搜索樹（不考慮平衡）

3.1、屬性

public class BinarySearchTree {static class TreeNode {int val;TreeNode left;TreeNode right;public TreeNode(int val) {this.val = val;}}private TreeNode root;......
}

3.2、查找

子樹為空，結束查找，未找到，返回 null。
key 與子樹根值相同，返回根。
key 與子樹根值不相同：若 key 比子樹根值小，繼續查找子樹的左子樹；若 key 比子樹根值大，繼續查找子樹的右子樹。

    public TreeNode search(int key) {TreeNode curRoot = root; // 當前子樹根節點while (curRoot!= null) {if (key == curRoot.val) { // 找到return curRoot;}else if (key < curRoot.val) { // 繼續查找左子樹curRoot = curRoot.left;}else {curRoot = curRoot.right; // 繼續查找右子樹}}return null; // 沒有找到}

3.3、插入

樹為空，新結點作為根。
比子樹根小，插入到左子樹中。
比子樹根大，插入到右子樹中。
子樹為空，插入為其父結點的孩子節點。若空樹是父節點的左子樹（key 比父節點小），插入為左孩子；若空樹是父節點的右子樹（key 比父節點大），插入為右孩子。

    public void insert(int val) {TreeNode newNode = new TreeNode(val);if (root == null) { // 樹為空，直接插入為根節點root = newNode;return;}TreeNode preRoot = null; // 記錄父節點TreeNode curRoot = root;while (curRoot!= null) {if (val == curRoot.val) { // 已經存在，不再插入return;} else if (val < curRoot.val) { // 插入左子樹preRoot = curRoot;curRoot = curRoot.left;} else { // 插入右子樹preRoot = curRoot;curRoot = curRoot.right;}}if (val < preRoot.val) { // 插入為父節點的左子樹preRoot.left = newNode;} else { // 插入為父節點的右子樹preRoot.right = newNode;}}

3.4、刪除

空樹，不需要刪除。
search，找到需要刪除的節點 node。
如果 node 沒有左孩子，右孩子替換 node 。node 為樹根，右孩子作為樹根；node 不為樹根，node 的父節點接 node 的右孩子。
如果 node 沒有右孩子，左孩子替換 node 。node 為樹根，左孩子作為樹根；node 不為樹根，node 的父節點接 node 的左孩子。
如果 node 有左右孩子，找到 node 的左子樹中的最大節點（最深的一個右孩子）或者右子樹中的最小節點（最深的一個左孩子），替換 node。

情況1，delete 不是 parent：parent 的 left 接 target 的 right。

情況2，delete 是 parent：parent 的 right 接 target 的 right。

    public void delete(int val) {
//        if (root == null) { // 樹為空，不再刪除
//            return;
//        }// 找到待刪除節點TreeNode preRoot = null; // 記錄待刪除節點的父節點TreeNode curRoot = root;while (curRoot!= null) {if (val == curRoot.val) { // 找到待刪除節點break;} else if (val < curRoot.val) { // 繼續查找左子樹preRoot = curRoot;curRoot = curRoot.left;} else { // 繼續查找右子樹preRoot = curRoot;curRoot = curRoot.right;}}if (curRoot == null) { // 待刪除節點不存在，包含樹空的情況return;}deleteNode(curRoot, preRoot);}private void deleteNode(TreeNode node, TreeNode preNode) {// 待刪除節點左子樹為空if (node.left == null) {if(root == node) { // 待刪除節點是根節點root = node.right;} else if (preNode.left == node) { // 待刪除節點是父節點的左子樹preNode.left = node.right;} else { // 待刪除節點是父節點的右子樹preNode.right = node.right;}return;}// 待刪除節點右子樹為空if (node.right == null) {if(root == node) { // 待刪除節點是根節點root = node.left;} else if (preNode.left == node) { // 待刪除節點是父節點的左子樹preNode.left = node.left;} else { // 待刪除節點是父節點的右子樹preNode.right = node.left;}return;}// 待刪除節點左右子樹均不為空// 找到右子樹的最小值，用最小值替換待刪除節點，并刪除最小值TreeNode parent = node;TreeNode target = node.right;// 找到右子樹中最深的左孩子 target，即最小值while (target.left != null) {parent = target;target = target.left;}// 最小值替換待刪除節點node.val = target.val;// 情況1：target 是右子樹的根if (parent == node) {parent.right = target.right;} else { // 情況2：target 是右子樹的孩子parent.left = target.right;}}

3.5、性能分析

? ? ? ? 插入、刪除操作都要經歷搜索操作，因此查找效率 O(logN) 代表了二叉搜索樹各個操作的性能。

三、哈希表

3.1、什么是哈希表

? ? ? ? 通過哈希函數，輸入 key 計算?value 的存放位置，通過這種哈希方法構造的結構就叫哈希表（散列表）。因為只需要計算一次哈希函數，所以刪除、插入、搜索操作都是 O(1)。

3.2、哈希沖突

3.2.1、什么是哈希沖突

? ? ? ? 不同的 key，通過哈希函數，得到相同的映射。哈希沖突是必然發生的，我們需要盡可能降低哈希沖突發生的概率。

3.2.2、如何避免哈希沖突

設計合理的哈希函數：地址值域應包含映射值域；映射值分布均勻；比較簡單。

常見的哈希函數：

① 直接定址法：hash(key) = a*key+b，根據 key 的分布確定線性函數。

② 除留余數法：hash(key) = key % p，p ≤ m 且盡量接近 m 的質數，m 是地址范圍大小。

調節負載因子：負載因子 = 填入表中的元素個數 / 散列表長度。沖突率與負載因子的增減趨勢一致，想降低沖突率，就要降低負載因子。數據是必須要添加的，因此只能增加散列表長度。在源碼中，負載因子超過一定值，就會自動擴容哈希表。

3.2.3、如何解決哈希沖突

? ? ? ? 當沖突發生，我們要解決沖突，讓每個元素都能填入。

閉散列地址法（開放）：沖突發生，表未滿，找下一個空位置。

① 線性探測法：從沖突位置開始，依次往后找第一個空位置，插入元素。

缺點：容易造成沖突元素堆積；不能隨意刪除，需要設置偽刪除標記。

② 二次探測法：Hash_i(key) = (Hash_0 ± i^2) % m，m 為散列表大小，i 為沖突次數。

缺點：雖然沖突堆積更分散，但還是會造成堆積。

散列表優點：不使用額外數據結構（鏈表）。

?開散列（哈希桶 / 鏈地址法）：同映射的 key 為一個集合（桶），用鏈表串起來。源碼用的哈希桶。
沖突嚴重時，一個桶的搜索效率不佳（太多元素堆積）。當桶超過一定長度，可以將這個桶轉為搜索樹或者哈希表。

3.3、實現簡單的哈希桶

3.3.1、屬性

public class HashBucket {static class Node {int key;int value;Node next;public Node(int key, int value) {this.key = key;this.value = value;}}public Node[] arr = new Node[10];public int usedSize; // 已有元素數量public static final float LOAD_FACTOR = 0.75f; // 裝載因子閾值......
}

3.3.2、添加元素

    // 哈希函數public int hash(int key) {return key % arr.length;}// 計算負載因子public float loadFactor() {return (float) usedSize / arr.length;}// 擴容public void resize() {Node[] newArr = new Node[arr.length * 2]; // 擴容為原數組的兩倍// 遍歷原數組，將元素添加到新數組中for (Node node : arr) {Node curr = node;while (curr != null) {int newIndex = hash(curr.key); // 計算新索引位置// 頭插法，將元素添加到新數組中Node next = curr.next;curr.next = newArr[newIndex];newArr[newIndex] = curr;curr = next;}}arr = newArr;}// 添加元素public void add(int key, int value) {int index = hash(key); // 計算索引位置Node newNode = new Node(key, value);// 如果該位置為空，則直接添加if (arr[index] == null) {arr[index] = newNode;} else { // 如果該位置不為空，則遍歷鏈表，找到對應的key，更新valueNode curr = arr[index];while (curr.next!= null) {if (curr.key == key) {curr.value = value;return;}curr = curr.next;}// 如果遍歷完鏈表，沒有找到對應的key，則添加到鏈表尾部curr.next = newNode;}usedSize++;// 如果已有元素數量超過負載因子閾值，則擴容，目的是降低沖突概率if (loadFactor() >= LOAD_FACTOR) {resize();}}

3.3.3、根據 key 查找 value

    public int get(int key) {int index = hash(key); // 計算索引位置Node curr = arr[index];// 遍歷鏈表，找到對應的key，返回valuewhile (curr!= null) {if (curr.key == key) {return curr.value;}curr = curr.next;}return -1; // 沒有找到對應的key}

3.3.4、刪除元素

    public void delete(int key) {int index = hash(key); // 計算索引位置Node curr = arr[index];Node prev = null;// 遍歷鏈表，找到對應的key，刪除節點while (curr!= null) {if (curr.key == key) {if (prev == null) { // 如果是頭節點，則直接刪除arr[index] = curr.next;} else { // 如果不是頭節點，則將前節點的next指針指向當前節點的next指針prev.next = curr.next;}usedSize--;return;}prev = curr;curr = curr.next;}}

3.3.5、泛型版

public class HashBucket2<K, V> {static class Node<K, V> {K key;V value;Node<K, V> next;public Node(K key, V value) {this.key = key;this.value = value;}}public Node<K, V>[] arr = (Node<K, V>[])new Node[10];public int usedSize; // 已有元素數量public static final float LOAD_FACTOR = 0.75f; // 裝載因子閾值// 哈希函數public int hash(K key) {return key.hashCode() % arr.length;}// 計算負載因子public float loadFactor() {return (float) usedSize / arr.length;}// 擴容public void resize() {Node<K, V>[] newArr = (Node<K, V>[])new Node[arr.length * 2]; // 擴容為原數組的兩倍// 遍歷原數組，將元素添加到新數組中for (Node<K, V> node : arr) {Node<K, V> curr = node;while (curr != null) {int newIndex = hash(curr.key); // 計算新索引位置// 頭插法，將元素添加到新數組中Node<K, V> next = curr.next;curr.next = newArr[newIndex];newArr[newIndex] = curr;curr = next;}}arr = newArr;}// 添加元素public void add(K key, V value) {int index = hash(key); // 計算索引位置Node<K, V> newNode = new Node<>(key, value);// 如果該位置為空，則直接添加if (arr[index] == null) {arr[index] = newNode;} else { // 如果該位置不為空，則遍歷鏈表，找到對應的key，更新valueNode<K, V> curr = arr[index];while (curr.next!= null) {if (curr.key.equals(key)) {curr.value = value;return;}curr = curr.next;}// 如果遍歷完鏈表，沒有找到對應的key，則添加到鏈表尾部curr.next = newNode;}usedSize++;// 如果已有元素數量超過負載因子閾值，則擴容，目的是降低沖突概率if (loadFactor() >= LOAD_FACTOR) {resize();}}// 根據key查找valuepublic V get(K key) {int index = hash(key); // 計算索引位置Node<K, V> curr = arr[index];// 遍歷鏈表，找到對應的key，返回valuewhile (curr!= null) {if (curr.key.equals(key)) {return curr.value;}curr = curr.next;}return null; // 沒有找到對應的key}// 刪除元素public void delete(K key) {int index = hash(key); // 計算索引位置Node<K, V> curr = arr[index];Node<K, V> prev = null;// 遍歷鏈表，找到對應的key，刪除節點while (curr!= null) {if (curr.key.equals(key)) {if (prev == null) { // 如果是頭節點，則直接刪除arr[index] = curr.next;} else { // 如果不是頭節點，則將前節點的next指針指向當前節點的next指針prev.next = curr.next;}usedSize--;return;}prev = curr;curr = curr.next;}}
}

重點：

類型改成泛型。
key 的哈希編碼使用 hashCode 計算。
key 的比較使用 equals。

四、二叉搜索樹和哈希表的對比

	TreeSet / TreeMap	HashSet / HashMap
key 大小是否有序	有序	無序
底層實現	紅黑樹	哈希桶
比較與重寫	key 必須能夠比較（自定義類重寫 compareTo 或者傳入自定義比較器，重寫了 compare）。	自定義類必須重寫 equals 和 hashCode

五、源碼閱讀

5.1、屬性

5.2、哈希編碼

5.3、構造函數

5.4、添加一個鍵值對

更錯：(n-1) & hash 等價于 hash % arr.length。

5.5、擴容

5.6、樹化

5.7、關鍵點總結

通過擴容代碼得知，調用無參構造方法，第一次 put 添加元素，會分配 16 大小的內存。
通過擴容代碼得知，正常擴容的情況，每次 2 倍擴容。
通過添加鍵值對的代碼得知，鏈表轉化為紅黑樹的第一個條件是，鏈表長度達到 8。
通過樹化的代碼得知，鏈表轉化為紅黑樹的第二個條件是，數組容量達到 64。

六、OJ 題練習

6.1、只出現一次的數

136. 只出現一次的數字 - 力扣（LeetCode）

思路：遍歷所有元素，不存在就添加，存在就刪除，最后剩下的就是單身狗。使用 HashSet，它的搜索、刪除、添加操作都是 O(1) 的復雜度，遍歷一邊就是 O(N)。

class Solution {public int singleNumber(int[] nums) {HashSet<Integer> set = new HashSet<>();// 遍歷數組for(int num : nums) {// 如果不存在于 set，添加if(!set.contains(num)) {set.add(num);} else { // 存在則刪除set.remove(num);}}// set 中剩下的那個就是單身狗return set.toArray(new Integer[0])[0];}
}

6.2、隨機鏈表的復制

138. 隨機鏈表的復制 - 力扣（LeetCode）? ? ?

思路：

1、遍歷一遍鏈表，記錄當前節點的上一個節點，創建好當前節點后，用記錄好的上一個節點與當前節點連接起來。但是 random 是隨機的，無法在第一遍中復制。如果再遍歷一次，依然無法直接復制 random，除非先找到舊 random，再計算當前節點與 random 節點的距離，最后根據距離找到新節點的 random 節點。非常麻煩。

2、將舊、新節點綁定為鍵值對（遍歷一次），再通過搜索舊節點（next 和 random）找到綁定的新節點（第二次遍歷）。使用HashMap實現。注：節點的地址是唯一的，可作key。

/*
// Definition for a Node.
class Node {int val;Node next;Node random;public Node(int val) {this.val = val;this.next = null;this.random = null;}
}
*/class Solution {public Node copyRandomList(Node head) {Map<Node, Node> map = new HashMap<>();// 遍歷舊鏈表，創建新節點，并與舊節點綁定Node tmp = head;while(tmp != null) {Node newNode = new Node(tmp.val);map.put(tmp, newNode);tmp = tmp.next;}// 遍歷舊鏈表，搜索 map 中的舊 next 和 random 節點，找到對應的新節點，并連接tmp = head;Node newHead = map.get(tmp);Node newTmp = newHead;while(tmp != null) {newTmp.next = map.get(tmp.next);newTmp.random = map.get(tmp.random);tmp = tmp.next;newTmp = newTmp.next;}return newHead;}
}

6.3、寶石和石頭? ? ? ??

771. 寶石與石頭 - 力扣（LeetCode）

思路：遍歷寶石，放入 set（寶石唯一，不需要重復，且便于后面進行搜索）。遍歷石頭，存在于 set 就計數。

class Solution {public int numJewelsInStones(String jewels, String stones) {Set<Character> set = new HashSet<>();int cnt = 0;// 寶石放入 setfor(int i = 0; i < jewels.length(); i++) {set.add(jewels.charAt(i));}// 遍歷石頭，是寶石的就計數for(int i = 0; i < stones.length(); i++) {char ch = stones.charAt(i);if(set.contains(ch)) {cnt++;}}return cnt;}
}

6.4、壞鍵盤

舊鍵盤 (20)__牛客網

<br/> 是換行，不管。第一行是應該輸入，第二行是實際輸入。

思路：英文字母壞鍵只輸出大寫，將輸入轉為大寫。用 set 存放實際輸入，讓字符無重復，便于搜索（類似于寶石/實際輸入和石頭/應該輸入）。遍歷應該輸入，不存在于 set 的就是壞鍵。不同于寶石和石頭的是，壞鍵不能重復，用 set 存（需要搜索壞鍵是否已存）。

import java.util.Scanner;
import java.util.Set;
import java.util.HashSet;public class Main {public static void main(String[] args) {Scanner in = new Scanner(System.in);while (in.hasNextLine()) { // 注意 while 處理多個 caseString str1 = in.nextLine(); // 應該輸入String str2 = in.nextLine(); // 實際輸入// 實際輸入轉大寫放入 setSet<Character> set = new HashSet<>();for(char ch : str2.toUpperCase().toCharArray()) {set.add(ch);}// 遍歷應該輸入（轉大寫），在 set 中搜索是否存在，不存在的是壞鍵盤，放入無重復的 brokenSetSet<Character> brokenSet = new HashSet<>();for(char ch : str1.toUpperCase().toCharArray()) {if(!set.contains(ch) && !brokenSet.contains(ch)) {brokenSet.add(ch);System.out.print(ch);}}}}
}

6.5、二叉搜索樹轉雙向鏈表

二叉搜索樹與雙向鏈表_牛客題霸_牛客網

分析：中序遍歷，把打印改成調整節點。需要記錄調整的前一個結點 preNode，調整當前節點時，與 preNode 連接起來。

public class Solution {private TreeNode head;private TreeNode pre;public TreeNode Convert(TreeNode pRootOfTree) {Convert2(pRootOfTree);return head;}public void Convert2(TreeNode pRootOfTree) {// 空樹，不調整if (pRootOfTree == null){return;}Convert(pRootOfTree.left); // 遍歷左子樹// 調整結點if (pre == null) { // 當前節點是頭節點，沒有 prehead = pRootOfTree;} else {pRootOfTree.left = pre;pre.right = pRootOfTree;}pre = pRootOfTree; // 更新 preConvert(pRootOfTree.right); // 遍歷右子樹}
}

6.6、前 K 個高頻單詞

692. 前K個高頻單詞 - 力扣（LeetCode）

分析：單詞（不可重復）和計數（可重復）為鍵值對，使用 HashMap。遍歷數組，map 不存在該單詞，添加，計數；存在，僅計數。k-top 大算法，使用優先級隊列，建立小根堆，大小為 k。大的入堆，計數不同時，計數多的優先；計數相同時，字典排序小的字符串優先。

    public List<String> topKFrequent(String[] words, int k) {// 建立HashMap，并遍歷數組，計數Map<String, Integer> map = new HashMap<>();for(String word : words) {// 不存在，添加并默認計數 0+1=1// 存在，僅自增1map.put(word, map.getOrDefault(word, 0)+1); }// k-top 大，k 大小的優先級隊列// 小根堆。計數不等，計數小的優先；計數相等，字典排序大的優先Comparator<Map.Entry<String, Integer>> comparator = (a, b) -> !a.getValue().equals(b.getValue())? a.getValue()-b.getValue() : b.getKey().compareTo(a.getKey());Queue<Map.Entry<String, Integer>> queue = new PriorityQueue<>(k, comparator);// map 中前 k 個入隊// map 不可以 forEach 迭代，轉為 set 可迭代for(Map.Entry<String, Integer> entry : map.entrySet()) {if(queue.size() < k) {queue.offer(entry);} else {// 比堆頂大的就刪掉堆頂，入隊if(comparator.compare(entry, queue.peek()) > 0) {queue.poll();queue.offer(entry);}}}// 前 K 大出隊，單詞包裝成 list，升序List<String> list = new ArrayList<>(k);while(!queue.isEmpty()) {list.add(queue.poll().getKey());}// 降序Collections.reverse(list);return list;}

6.7、存在重復元素

217. 存在重復元素 - 力扣（LeetCode）

分析：遍歷一遍，使用 Set?存儲。該數字存在，直接返回 true；不存在，就添加。遍歷結束了都沒有重復的，返回 false。

    public boolean containsDuplicate(int[] nums) {Set<Integer> set = new HashSet<>();for(int num : nums) {if(set.contains(num)) {return true;}set.add(num);}return false;}

6.8、存在重復元素Ⅱ

219. 存在重復元素 II - 力扣（LeetCode）

分析：遍歷一遍，使用 Map?存儲，value 是索引。該數字存在，且索引值滿足條件，直接返回 true；不存在，就添加。遍歷結束了都沒有重復的，返回 false。

    public boolean containsNearbyDuplicate(int[] nums, int k) {Map<Integer, Integer> map = new HashMap<>();int index = 0;for(int j = 0; j < nums.length; j++) {if(map.containsKey(nums[j]) && Math.abs(map.get(nums[j]) - j) <= k) {return true;}map.put(nums[j], index++);}return false;}