當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

哈希值哈希码_什么是哈希？哈希码如何工作-带有示例

發布時間：2024/1/1 编程问答 39 豆豆

生活随笔收集整理的這篇文章主要介紹了哈希值哈希码_什么是哈希？哈希码如何工作-带有示例小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

哈希值哈希碼

哈希簡介 (Introduction to hashing)

Hashing is designed to solve the problem of needing to efficiently find or store an item in a collection.

哈希設計用于解決需要在集合中有效查找或存儲項目的問題。

For example, if we have a list of 10,000 words of English and we want to check if a given word is in the list, it would be inefficient to successively compare the word with all 10,000 items until we find a match. Even if the list of words are lexicographically sorted, like in a dictionary, you will still need some time to find the word you are looking for.

例如，如果我們有一個10,000個英語單詞的列表，并且想要檢查列表中是否有給定的單詞，那么將這個單詞與所有10,000個項目相繼進行比較直到找到匹配項都是無效的。即使單詞列表按字典順序排序，就像在字典中一樣，您仍然需要一些時間才能找到所需的單詞。

Hashing is a technique to make things more efficient by effectively narrowing down the search at the outset.

散列是一種通過一開始就有效縮小搜索范圍來提高效率的技術。

什么是哈希？ (What is hashing?)

Hashing means using some function or algorithm to map object data to some representative integer value.

散列表示使用某種函數或算法將對象數據映射到某個代表整數值。

This so-called hash code (or simply hash) can then be used as a way to narrow down our search when looking for the item in the map.

然后，可以使用這種所謂的哈希碼(或簡稱為哈希)來縮小我們在地圖中查找項目時的搜索范圍。

Generally, these hash codes are used to generate an index, at which the value is stored.

通常，這些哈希碼用于生成索引，在該索引中存儲值。

哈希如何工作 (How hashing works)

In hash tables, you store data in forms of key and value pairs. The key, which is used to identify the data, is given as an input to the hashing function. The hash code, which is an integer, is then mapped to the fixed size we have.

在哈希表中，您以鍵和值對的形式存儲數據。用來標識數據的密鑰作為哈希函數的輸入提供。然后將哈希碼(是整數)映射到我們擁有的固定大小。

Hash tables have to support 3 functions.

哈希表必須支持3個功能。

insert (key, value)
插入(鍵，值)
get (key)
獲取(密鑰)
delete (key)
刪除(鍵)

Purely as an example to help us grasp the concept, let us suppose that we want to map a list of string keys to string values (for example, map a list of countries to their capital cities).

純粹以幫助我們理解該概念為例，讓我們假設我們想將字符串鍵列表映射到字符串值(例如，將國家列表映射到其首都)。

So let’s say we want to store the data in Table in the map.

假設我們要將數據存儲在地圖的Table中。

And let us suppose that our hash function is to simply take the length of the string.

讓我們假設我們的哈希函數只是獲取字符串的長度。

For simplicity, we will have two arrays: one for our keys and one for the values.So to put an item in the hash table, we compute its hash code (in this case, simply count the number of characters), then put the key and value in the arrays at the corresponding index.

為了簡單起見，我們將有兩個數組：一個用于我們的鍵，另一個用于值。因此，將一個項放入哈希表中，我們計算其哈希碼(在這種情況下，只需計算字符數)，然后將鍵和值在數組中的對應索引處。

For example, Cuba has a hash code (length) of 4. So we store Cuba in the 4th position in the keys array, and Havana in the 4th index of the values array etc. And we end up with the following:

例如，古巴的哈希碼(長度)為4。因此，我們將古巴存儲在鍵數組的第4個位置，將哈瓦那存儲在values數組的第4個索引中，等等。最后得到以下內容：

Now, in this specific example things work quite well. Our array needs to be big enough to accommodate the longest string, but in this case that’s only 11 slots.We do waste a bit of space because, for example, there are no 1-letter keys in our data, nor keys between 8 and 10 letters.

現在，在此特定示例中，一切工作正常。我們的數組必須足夠大以容納最長的字符串，但是在這種情況下只有11個插槽。我們確實浪費了一些空間，因為例如我們的數據中沒有1個字母的鍵，也沒有8到8之間的鍵。 10個字母。

But in this case, the wasted space isn’t so bad either. Taking the length of a string is nice and fast, and so is the process of finding the value associated with a given key (certainly faster than doing up to five string comparisons).

但是在這種情況下，浪費的空間也不是那么糟糕。取得字符串的長度既好又快速，找到與給定鍵關聯的值的過程也是如此(肯定比進行最多五個字符串比較要快)。

But, what do we do if our dataset has a string which has more than 11 characters?What if we have one another word with 5 characters, “India”, and try assigning it to an index using our hash function. Since the index 5 is already occupied, we have to make a call on what to do with it. This is called a collision.

但是，如果我們的數據集包含一個包含11個以上字符的字符串，我們該怎么辦？如果我們有另一個5個字符的單詞“印度”，并嘗試使用我們的哈希函數將其分配給索引，該怎么辦。由于索引5已被占用，因此我們必須調用如何處理它。這稱為碰撞。

If our dataset had a string with thousand characters, and you make an array of thousand indices to store the data, it would result in a wastage of space. If our keys were random words from English, where there are so many words with same length, using length as a hashing function would be fairly useless.

如果我們的數據集包含一個包含數千個字符的字符串，并且您創建了一個包含數千個索引的數組來存儲數據，則將導致空間的浪費。如果我們的關鍵字是來自英語的隨機單詞，其中有那么多個長度相同的單詞，那么將length用作哈希函數將毫無用處。

碰撞處理 (Collision Handling)

Two basic methods are used to handle collisions.

使用兩種基本方法來處理沖突。

Separate Chaining

單獨鏈接

Open Addressing

開放式尋址

單獨鏈接 (Separate Chaining)

Hash collision handling by separate chaining, uses an additional data structure, preferrably linked list for dynamic allocation, into buckets. In our example, when we add India to the dataset, it is appended to the linked list stored at the index 5, then our table would look like this.

通過單獨的鏈進行哈希沖突處理，將其他數據結構(最好是用于動態分配的鏈表)使用到存儲桶中。在我們的示例中，當將印度添加到數據集時，它將印度附加到存儲在索引5的鏈表中，那么我們的表將如下所示。

To find an item we first go to the bucket and then compare keys. This is a popular method, and if a list of links is used the hash never fills up. The cost for get(k) is on average O(n) where n is the number of keys in the bucket, total number of keys be N.

要查找物品，我們首先進入存儲桶，然后比較鍵。這是一種流行的方法，如果使用鏈接列表，則哈希永遠不會填滿。 get(k)的成本平均為O(n) ，其中n是存儲桶中的密鑰數，密鑰總數為N。

The problem with separate chaining is that the data structure can grow with out bounds.

單獨鏈接的問題在于數據結構可以無限制地增長。

開放式尋址 (Open Addressing)

Open addressing does not introduce any new data structure. If a collision occurs then we look for availability in the next spot generated by an algorithm. Open Addressing is generally used where storage space is a restricted, i.e. embedded processors. Open addressing not necessarily faster then separate chaining.

開放式尋址不會引入任何新的數據結構。如果發生沖突，那么我們會在算法生成的下一個位置中尋找可用性。開放式尋址通常用于存儲空間有限(即嵌入式處理器)的地方。開放式尋址不一定比單獨鏈接要快。

Methods for Open Addressing

開放式尋址方法

[Linear Probing
[線性探測
Quadratic Probing
二次探測
Double Hashing
雙重散列

如何在代碼中使用哈希。 (How to use hashing in your code.)

Python (Python)

# Few languages like Python, Ruby come with an in-built hashing support.# Declarationmy_hash_table = {}my_hash_table = dict()# Insertionmy_hash_table[key] = value# Look upvalue = my_hash_table.get(key) # returns None if the key is not present || Deferred in python 3, available in python 2value = my_hash_table[key] # throws a ValueError exception if the key is not present# Deletiondel my_hash_table[key] # throws a ValueError exception if the key is not present# Getting all keys and values stored in the dictionarykeys = my_hash_table.keys()values = my_hash_table.values()

Run Code

運行代碼

Java (Java)

// Java doesn't include hashing by default, you have to import it from java.util library// Importing hashmapsimport java.util.HashMap;// DeclarationHashMap<Integer, Integer> myHashTable = new HashMap<Integer, Integer>(); // declares an empty map.// InsertionmyHashTable.put(key, value);// DeletionmyHashtable.remove(key);// Look upmyHashTable.get(key); // returns null if the key K is not presentmyHashTable.containsKey(key); // returns a boolean value, indicating the presence of a key// Number of key, value pairs in the hash tablemyHashTable.size();