国产av日韩一区二区三区精品,成人性爱视频在线观看,国产,欧美,日韩,一区,www.成色av久久成人,2222eeee成人天堂

Table of Contents
深入理解PHP內(nèi)核(六)哈希表以及PHP的哈希表實(shí)現(xiàn),深入理解
Home Backend Development PHP Tutorial In-depth understanding of PHP kernel (6) hash table and PHP hash table implementation, in-depth understanding_PHP tutorial

In-depth understanding of PHP kernel (6) hash table and PHP hash table implementation, in-depth understanding_PHP tutorial

Jul 12, 2016 am 08:55 AM
php

深入理解PHP內(nèi)核(六)哈希表以及PHP的哈希表實(shí)現(xiàn),深入理解

原文鏈接:http://www.orlion.ga/241/

一、哈希表(HashTable)

? ? 大部分動(dòng)態(tài)語(yǔ)言的實(shí)現(xiàn)中都使用了哈希表,哈希表是一種通過(guò)哈希函數(shù),將特定的鍵映射到特定值得一種數(shù)據(jù)

?

結(jié)構(gòu),它維護(hù)鍵和值之間一一對(duì)應(yīng)關(guān)系。

鍵(key):用于操作數(shù)據(jù)的標(biāo)示,例如PHP數(shù)組中的索引或者字符串鍵等等。

槽(slot/bucket):哈希表中用于保存數(shù)據(jù)的一個(gè)單元,也就是數(shù)組真正存放的容器。

哈希函數(shù)(hash function):將key映射(map)到數(shù)據(jù)應(yīng)該存放的slot所在位置的函數(shù)。

哈希沖突(hash collision):哈希函數(shù)將兩個(gè)不同的key映射到同一個(gè)索引的情況。

?

? ? 目前解決hash沖突的方法有兩種:鏈接法和開放尋址法。

?

1、沖突解決

? ? (1)鏈接法

? ? 鏈接法通過(guò)使用一個(gè)鏈表來(lái)保存slot值的方式來(lái)解決沖突,也就是當(dāng)不同的key映射到一個(gè)槽中的時(shí)候使用鏈表

?

來(lái)保存這些值。(PHP中正是使用了這種方式);

? ? (2)開放尋址法

? ? 使用開放尋址法是槽本身直接存放數(shù)據(jù),在插入數(shù)據(jù)時(shí)如果key所映射到的索引已經(jīng)有數(shù)據(jù)了,這說(shuō)明有沖突,

?

這時(shí)會(huì)尋找下一個(gè)槽,如果該槽也被占用了則繼續(xù)尋找下一個(gè)槽,直到找到?jīng)]有被占用的槽,在查找時(shí)也是這樣

?

2、哈希表的實(shí)現(xiàn)

? ? 哈希表的實(shí)現(xiàn)主要完成的工作只有三點(diǎn):

? ? * 實(shí)現(xiàn)哈希函數(shù)

? ? * 沖突的解決

? ? * 操作接口的實(shí)現(xiàn)

(1)數(shù)據(jù)結(jié)構(gòu)

????首先需要一個(gè)容器來(lái)曹村我們的哈希表,哈希表需要保存的內(nèi)容主要是保存進(jìn)來(lái)的數(shù)據(jù),同時(shí)為了方便的得知哈希表中存儲(chǔ)的元素個(gè)數(shù),需要保存一個(gè)大小字段,第二個(gè)需要的就是保存數(shù)據(jù)的容器。下面將實(shí)現(xiàn)一個(gè)簡(jiǎn)易的哈希表,基本的數(shù)據(jù)結(jié)構(gòu)主要有兩個(gè),一個(gè)用于保存哈希表本身,另外一個(gè)就是用于實(shí)際保存數(shù)據(jù)的單鏈表了,定義如下:

typedef struct _Bucket
{
    char *key;
    void *value;
    struct _Bucket *next;
 
} Bucket;
 
typedef struct _HashTable
{
    int size;
    Bucket* buckets;
} HashTable;

上邊的定義與PHP中的實(shí)現(xiàn)相似,為了簡(jiǎn)化key的數(shù)據(jù)類型為字符串,而存儲(chǔ)的結(jié)構(gòu)可以為任意類型。

Bucket結(jié)構(gòu)體是一個(gè)單鏈表,這是為了解決哈希沖突。當(dāng)多個(gè)key映射到同一個(gè)index的時(shí)候?qū)_突的元素鏈接起來(lái)

(2)哈希函數(shù)實(shí)現(xiàn)

我們采用一種最簡(jiǎn)單的哈希算法實(shí)現(xiàn):將key字符串的所有字符加起來(lái),然后以結(jié)果對(duì)哈希表的大小取模,這樣索引就能落在數(shù)組索引的范圍之內(nèi)了。

static int hash_str(char *key)
{
    int hash = 0;
 
    char *cur = key;
 
    while(*(cur++) != '\0') {
        hash += *cur;
    }
 
    return hash;
}
 
// ?使用這個(gè)宏來(lái)求得key在哈希表中的索引
#define HASH_INDEX(ht, key) (hash_str((key)) % (ht)->size)

PHP使用的哈希算法稱為DJBX33A。為了操作哈希表定義了如下幾個(gè)操作函數(shù):

int hash_init(HashTable *ht);                               // 初始化哈希表
int hash_lookup(HashTable *ht, char *key, void **result);   // 根據(jù)key查找內(nèi)容
int hash_insert(HashTable *ht, char *key, void *value);     // 將內(nèi)容插哈希表中
int hash_remove(HashTable *ht, char *key);                  // 刪除key所指向的內(nèi)容
int hash_destroy(HashTable *ht);

下面以插入和獲取操作函數(shù)為例:

int hash_insert(HashTable *ht, char *key, void *value)
{
    // check if we need to resize the hashtable
    resize_hash_table_if_needed(ht);    // 哈希表不固定大小,當(dāng)插入的內(nèi)容快占滿哈希表的存儲(chǔ)空間
                                        // 將對(duì)哈希表進(jìn)行擴(kuò)容,以便容納所有的元素
    int index = HASH_INDEX(ht, key);    // 找到key所映射到的索引
 
    Bucket *org_bucket = ht->buckets[index];
    Bucket *bucket = (Bucket *)malloc(sizeof(Bucket)); // 為新元素申請(qǐng)空間
 
    bucket->key   = strdup(key);
    // 將值內(nèi)容保存起來(lái),這里只是簡(jiǎn)單的將指針指向要存儲(chǔ)的內(nèi)容,而沒(méi)有將內(nèi)容復(fù)制
    bucket->value = value;  
 
    LOG_MSG("Insert data p: %p\n", value);
 
    ht->elem_num += 1; // 記錄一下現(xiàn)在哈希表中的元素個(gè)數(shù)
 
    if(org_bucket != NULL) { // 發(fā)生了碰撞,將新元素放置在鏈表的頭部
        LOG_MSG("Index collision found with org hashtable: %p\n", org_bucket);
        bucket->next = org_bucket;
    }
 
    ht->buckets[index]= bucket;
 
    LOG_MSG("Element inserted at index %i, now we have: %i elements\n",
        index, ht->elem_num);
 
    return SUCCESS;
}

在查找時(shí)首先找到元素所在的位置,如果存在元素,則將鏈表中的所有元素的key和要查找的key依次對(duì)比,直到找到一致的元素,否則說(shuō)明該值沒(méi)有匹配的內(nèi)容。

int hash_lookup(HashTable *ht, char *key, void **result)
{
    int index = HASH_INDEX(ht, key);
    Bucket *bucket = ht->buckets[index];
     if(bucket == NULL) return FAILED;
 
    // 查找這個(gè)鏈表以便找到正確的元素,通常這個(gè)鏈表應(yīng)該是只有一個(gè)元素的,也就不同多次循環(huán)
    // 要保證這一點(diǎn)需要有一個(gè)合適的哈希算法。
    while(bucket)
    {
        if(strcmp(bucket->key, key) == 0)
        {
            LOG_MSG("HashTable found key in index: %i with  key: %s value: 
%p\n",
                index, key, bucket->value);
            *result = bucket->value;    
            return SUCCESS;
        }
 
        bucket = bucket->next;
    }
 
    LOG_MSG("HashTable lookup missed the key: %s\n", key);
    return FAILED;
}

PHP中的數(shù)組是基于哈希表實(shí)現(xiàn)的,依次給數(shù)組添加元素時(shí),元素之間是有順序的,而這里的哈希表在物理上顯然是接近平均分布的,這樣是無(wú)法根據(jù)插入的先后順序獲取到這些元素的,在PHP的實(shí)現(xiàn)中Bucket結(jié)構(gòu)體還維護(hù)了另一個(gè)指針字段來(lái)維護(hù)元素之間的關(guān)系。

二、PHP的哈希表實(shí)現(xiàn)

1、PHP的哈希實(shí)現(xiàn)

PHP中的哈希表是十分重要的一個(gè)數(shù)據(jù)接口,基本上大部分的語(yǔ)言特征都是基于哈希表的,例如:變量的作用域和變量的存儲(chǔ),類的實(shí)現(xiàn)以及Zend引擎內(nèi)部的數(shù)據(jù)有很多都是保存在哈希表中的。

(1)數(shù)據(jù)結(jié)構(gòu)及說(shuō)明

Zend為了保存數(shù)據(jù)之間的關(guān)系使用了雙向鏈表來(lái)保存數(shù)據(jù)

(2)哈希表結(jié)構(gòu)

PHP中的哈希表實(shí)現(xiàn)在Zend/zend_hash.c中,PHP使用如下兩個(gè)數(shù)據(jù)結(jié)構(gòu)來(lái)實(shí)現(xiàn)哈希表,HashTable結(jié)構(gòu)體用于保存整個(gè)哈希表需要的基本信息,而Bucket結(jié)構(gòu)體用于保存具體的數(shù)據(jù)內(nèi)容,如下:

typedef struct _hashtable { 
    uint nTableSize;        // hash Bucket的大小,最小為8,以2x增長(zhǎng)
    uint nTableMask;        // nTableSize-1,索引取值的優(yōu)化
    uint nNumOfElements;    // hash Bucket中當(dāng)前存在的元素個(gè)數(shù),count()函數(shù)會(huì)直接返回此值
    ulong nNextFreeElement; // 下一個(gè)數(shù)字索引的位置
    Bucket *pInternalPointer;   // 當(dāng)前遍歷的指針(foreach 比f(wàn)or快的原因之一)
    Bucket *pListHead;          // 存儲(chǔ)數(shù)頭元素指針
    Bucket *pListTail;          // 存儲(chǔ)數(shù)組尾元素指針
    Bucket **arBuckets;         // 存儲(chǔ)hash數(shù)組
    dtor_func_t pDestructor;
    zend_bool persistent;
    unsigned char nApplyCount; // 標(biāo)記當(dāng)前hash Bucket被遞歸訪問(wèn)的次數(shù)(防止多次遞歸)
    zend_bool bApplyProtection;// 標(biāo)記當(dāng)前hash桶允許不允許多次訪問(wèn),不允許時(shí),最多只能遞歸3此
#if ZEND_DEBUG
    int inconsistent;
#endif
} HashTable;

nTableSize字段用于標(biāo)示哈希表的容量,哈希表的初始化容量最小為8.首先看看哈希表的初始化函數(shù):

ZEND_API int _zend_hash_init(HashTable *ht, uint nSize, hash_func_t 
pHashFunction,
                    dtor_func_t pDestructor, zend_bool persistent 
ZEND_FILE_LINE_DC)
{
    uint i = 3;
    //...
    if (nSize >= 0x80000000) {
        /* prevent overflow */
        ht->nTableSize = 0x80000000;
    } else {
        while ((1U << i) < nSize) {
            i++;
        }
        ht->nTableSize = 1 << i;
    }
    // ...
    ht->nTableMask = ht->nTableSize - 1;
 
    /* Uses ecalloc() so that Bucket* == NULL */
    if (persistent) {
        tmp = (Bucket **) calloc(ht->nTableSize, sizeof(Bucket *));
        if (!tmp) {
            return FAILURE;
        }
        ht->arBuckets = tmp;
    } else {
        tmp = (Bucket **) ecalloc_rel(ht->nTableSize, sizeof(Bucket *));
        if (tmp) {
            ht->arBuckets = tmp;
        }
    }
 
    return SUCCESS;
}

例如如果設(shè)置初始大小為10,則上面的算法將會(huì)將大小調(diào)整為16.也就是始終將大小調(diào)整為接近初始大小的2的整數(shù)次方

為什么這么調(diào)整呢?先看看HashTable將哈希值映射到槽位的方法:

h = zend_inline_hash_func(arKey, nKeyLength);
nIndex = h & ht->nTableMask;

從上邊的_zend_hash_init()函數(shù)中可知,ht->nTableMask的大小為ht->nTableSize – 1。這里使用&操作而不是使用取模,這是因?yàn)橄鄬?duì)來(lái)說(shuō)取模的操作的消耗和按位與的操作大很多。

設(shè)置好了哈希表的大小后就需要為哈希表申請(qǐng)存儲(chǔ)空間了,如上邊初始化的代碼,根據(jù)是否需要持久保存而調(diào)用了不同的內(nèi)存申請(qǐng)方法,是需要持久體現(xiàn)的是在前面PHP生命周期里介紹的:持久內(nèi)容能在多個(gè)請(qǐng)求之間可訪問(wèn),而如果是非持久存儲(chǔ)則會(huì)在在請(qǐng)求結(jié)束時(shí)釋放占用的空間。具體內(nèi)容將在內(nèi)存管理中詳解

HashTable中的nNumOfElements字段很好理解,每插入一個(gè)元素或者unset刪掉元素時(shí)會(huì)更新這個(gè)字段,這樣在進(jìn)行count()函數(shù)統(tǒng)計(jì)數(shù)組元素個(gè)數(shù)時(shí)就能快速的返回。

nNextFreeElement字段非常有用,先看一段PHP代碼:

<?php
$a = array(10 => 'Hello');
$a[] = 'TIPI';
var_dump($a);
 
// ouput
array(2) {
  [10]=>
  string(5) "Hello"
  [11]=>
  string(5) "TIPI"
}

PHP中可以不指定索引值向數(shù)組中添加元素,這時(shí)將默認(rèn)使用數(shù)字作為索引,和C語(yǔ)言中的枚舉類似,而這個(gè)元素的索引到底是多個(gè)就由nNextFreeElement字段決定了。如果數(shù)組中存在了數(shù)字key,則會(huì)默認(rèn)使用最新使用的key+1,如上例中已經(jīng)存在了10作為key的元素,這樣新插入的默認(rèn)索引就為11了。

下面看看保存哈希表數(shù)據(jù)的槽位數(shù)據(jù)結(jié)構(gòu)體:

typedef struct bucket {
    ulong h;            // 對(duì)char *key進(jìn)行hash后的值,或者是用戶指定的數(shù)字索引值
    uint nKeyLength;    // hash關(guān)鍵字的長(zhǎng)度,如果數(shù)組索引為數(shù)字,此值為0
    void *pData;        // 指向value,一般是用戶數(shù)據(jù)的副本,如果是指針數(shù)據(jù),則指向pDataPtr
    void *pDataPtr;     // 如果是指針數(shù)組,此值會(huì)指向真正的value,同時(shí)上面pData會(huì)指向此值
    struct bucket *pListNext;   // 整個(gè)hash表的下一個(gè)元素
    struct bucket *pListLast;   // 整個(gè)hash表的上一個(gè)元素
    struct bucket *pNext;       // 存放在同一個(gè)hash Bucket內(nèi)的下一個(gè)元素
    struct bucket *pLast;       // 存放在同一個(gè)hash Bucket內(nèi)的上一個(gè)元素
    char arKey[1];  
    /*
    存儲(chǔ)字符索引,此項(xiàng)必須放在最末尾,因?yàn)榇颂幹欢x了1個(gè)字節(jié),存儲(chǔ)的實(shí)際上是指向char *key的值,
    這就意味著可以省去再賦值一次的消耗,而且,有時(shí)此值并不需要,所以同時(shí)還節(jié)省了空間。
    */
} Bucket;

????如上面各字段的注釋。h字段保存哈希表key哈希后的值。在PHP中可以使用字符串或者數(shù)字作為數(shù)組的索引。因?yàn)閿?shù)字的索引是唯一的。如果再進(jìn)行一次哈希將會(huì)極大的浪費(fèi)。h字段后面的nKeyLength字段是作為key長(zhǎng)度的標(biāo)示,如果索引是數(shù)字的話,則nKeyLength為0.在PHP中定義數(shù)組時(shí)如果字符串可以被轉(zhuǎn)換成數(shù)字也會(huì)進(jìn)行轉(zhuǎn)換。所以在PHP中例如'10','11'這類的字符索引和數(shù)字索引10,11沒(méi)有區(qū)別

  • Bucket結(jié)構(gòu)體維護(hù)了兩個(gè)雙向鏈表,pNext和pLast指針?lè)謩e指向本槽位所在的鏈表的關(guān)系

  • 而pListNext和pListLast指針指向的則是整個(gè)哈希表所有的數(shù)據(jù)之間的鏈接關(guān)系。HashTable結(jié)構(gòu)體中的pListHead和pListTail則維護(hù)整個(gè)哈希表的頭元素指針和最后一個(gè)元素的指針

    ?

?

????哈希表的操作接口:

????PHP提供了如下幾類操作接口:

  • 初始化操作,例如zend_hash_init()函數(shù),用于初始化哈希表接口,分配空間等。

  • 查找,插入,刪除和更新操作接口,這是比較常規(guī)的操作。

  • 迭代和循環(huán),這類的接口用于循環(huán)對(duì)哈希表進(jìn)行操作。

  • 復(fù)制,排序,倒置和銷毀等操作。

www.bkjia.comtruehttp://www.bkjia.com/PHPjc/1115246.htmlTechArticle深入理解PHP內(nèi)核(六)哈希表以及PHP的哈希表實(shí)現(xiàn),深入理解 原文鏈接:http://www.orlion.ga/241/ 一、哈希表(HashTable) 大部分動(dòng)態(tài)語(yǔ)言的實(shí)現(xiàn)中都使...
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress AI Tool

Undress images for free

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to get the current session ID in PHP? How to get the current session ID in PHP? Jul 13, 2025 am 03:02 AM

The method to get the current session ID in PHP is to use the session_id() function, but you must call session_start() to successfully obtain it. 1. Call session_start() to start the session; 2. Use session_id() to read the session ID and output a string similar to abc123def456ghi789; 3. If the return is empty, check whether session_start() is missing, whether the user accesses for the first time, or whether the session is destroyed; 4. The session ID can be used for logging, security verification and cross-request communication, but security needs to be paid attention to. Make sure that the session is correctly enabled and the ID can be obtained successfully.

PHP get substring from a string PHP get substring from a string Jul 13, 2025 am 02:59 AM

To extract substrings from PHP strings, you can use the substr() function, which is syntax substr(string$string,int$start,?int$length=null), and if the length is not specified, it will be intercepted to the end; when processing multi-byte characters such as Chinese, you should use the mb_substr() function to avoid garbled code; if you need to intercept the string according to a specific separator, you can use exploit() or combine strpos() and substr() to implement it, such as extracting file name extensions or domain names.

How do you perform unit testing for php code? How do you perform unit testing for php code? Jul 13, 2025 am 02:54 AM

UnittestinginPHPinvolvesverifyingindividualcodeunitslikefunctionsormethodstocatchbugsearlyandensurereliablerefactoring.1)SetupPHPUnitviaComposer,createatestdirectory,andconfigureautoloadandphpunit.xml.2)Writetestcasesfollowingthearrange-act-assertpat

How to split a string into an array in PHP How to split a string into an array in PHP Jul 13, 2025 am 02:59 AM

In PHP, the most common method is to split the string into an array using the exploit() function. This function divides the string into multiple parts through the specified delimiter and returns an array. The syntax is exploit(separator, string, limit), where separator is the separator, string is the original string, and limit is an optional parameter to control the maximum number of segments. For example $str="apple,banana,orange";$arr=explode(",",$str); The result is ["apple","bana

JavaScript Data Types: Primitive vs Reference JavaScript Data Types: Primitive vs Reference Jul 13, 2025 am 02:43 AM

JavaScript data types are divided into primitive types and reference types. Primitive types include string, number, boolean, null, undefined, and symbol. The values are immutable and copies are copied when assigning values, so they do not affect each other; reference types such as objects, arrays and functions store memory addresses, and variables pointing to the same object will affect each other. Typeof and instanceof can be used to determine types, but pay attention to the historical issues of typeofnull. Understanding these two types of differences can help write more stable and reliable code.

Using std::chrono in C Using std::chrono in C Jul 15, 2025 am 01:30 AM

std::chrono is used in C to process time, including obtaining the current time, measuring execution time, operation time point and duration, and formatting analysis time. 1. Use std::chrono::system_clock::now() to obtain the current time, which can be converted into a readable string, but the system clock may not be monotonous; 2. Use std::chrono::steady_clock to measure the execution time to ensure monotony, and convert it into milliseconds, seconds and other units through duration_cast; 3. Time point (time_point) and duration (duration) can be interoperable, but attention should be paid to unit compatibility and clock epoch (epoch)

How does PHP handle Environment Variables? How does PHP handle Environment Variables? Jul 14, 2025 am 03:01 AM

ToaccessenvironmentvariablesinPHP,usegetenv()orthe$_ENVsuperglobal.1.getenv('VAR_NAME')retrievesaspecificvariable.2.$_ENV['VAR_NAME']accessesvariablesifvariables_orderinphp.iniincludes"E".SetvariablesviaCLIwithVAR=valuephpscript.php,inApach

How to pass a session variable to another page in PHP? How to pass a session variable to another page in PHP? Jul 13, 2025 am 02:39 AM

In PHP, to pass a session variable to another page, the key is to start the session correctly and use the same $_SESSION key name. 1. Before using session variables for each page, it must be called session_start() and placed in the front of the script; 2. Set session variables such as $_SESSION['username']='JohnDoe' on the first page; 3. After calling session_start() on another page, access the variables through the same key name; 4. Make sure that session_start() is called on each page, avoid outputting content in advance, and check that the session storage path on the server is writable; 5. Use ses

See all articles