CDictionary的实际工作原理

程序员有二十年 2024-09-24 10:40:48

Dictionary<TKey, TValue>是 C# 中非常流行的数据结构,也是面试问题的热门选择。我已经使用了 10 亿次,我非常确定我了解它们的工作原理。但是,当我更深入地研究它们并检查实际代码时,我发现它们比我想象的还要好(也许您也是如此)。在本文中,我们将一起进行深入研究,甚至编写我们自己的词典教育副本。所以和我一起开始吧!Dictionary

为了确保我们的副本与实际代码匹配,我们将首先探索原始代码中的内容,然后删除所有不必要的内容,并在需要的地方添加日志 ()。只需复制两个主要方法就足够了:Add 和 GetValueOrDefault 来重新创建 的所有基本要素,因此这就是我们要执行的操作。但首先,让我们看看 a 中的字段 :Console.WriteLineDictionaryDictionary

我将使用参考源中的 .NET Framework 4.8 中的代码。尽管现代 .NET 代码稍微复杂一些,但基本要素仍然相同,因此 Framework 版本应该可以。

private struct Entry { public int hashCode; public int next; public TKey key; public TValue value; } private int[] buckets; private Entry[] entries; private int count; private int version; private int freeList; private int freeCount; private IEqualityComparer<TKey> comparer;

和 properties 是优化技术的一部分,对于 a 的运行不是必需的 - 我们将把它们传递给我们的 replica。 只是一个 changes 计数器,我们也会传递它。为简单起见,我们还将使用而不是允许外部配置。添加打印当前状态(所有属性的值)的方法,我们将得到:freeListfreeCountDictionaryversionEqualityComparer<TKey>.Default

public EducationalDictionary<TKey, TValue>{private struct Entry {public int hashCode;public int next;public TKey key;public TValue? value;string ValueString => value == ? "" : value!.ToString()!;public override string ToString() {return $"{key} - {ValueString} + (next = {next})"; } }private int[] buckets;private Entry[] entries;private int count;private readonly EqualityComparer<TKey> comparer = EqualityComparer<TKey>.Default;public void PrintFullState(string preface) { Console.WriteLine(); Console.Write(this.ToString(preface)); }public string ToString(string preface) {StringBuilder result = new(); result.AppendLine($"{preface}, state:"); result.AppendLine(); result.AppendLine($"buckets: [{String.Join(", ", buckets!)}]"); result.AppendLine($"entries:");for (int i = 0; i < entries.Length; i++) { result.AppendLine($" [{i}] = {entries[i]}"); } result.AppendLine($"count: {count}");return result.ToString(); }}

源代码中的方法实质上是对另一个方法的调用:Add

public void Add(TKey key, TValue value) { Insert(key, value, true); }

该方法可能看起来很可怕,但别担心,我们会一点一点地剖析它:Insert

private void Insert(TKey key, TValue value, bool add) {if( key == ) { ThrowHelper.ThrowArgumentException(ExceptionArgument.key); }if (buckets == ) Initialize(0);int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;int targetBucket = hashCode % buckets.Length;#if FEATURE_RANDOMIZED_STRING_HASHINGint collisionCount = 0;#endiffor (int i = buckets[targetBucket]; i >= 0; i = entries[i].next) {if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) {if (add) { ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_AddingDuplicate); } entries[i].value = value; version++;return; } #if FEATURE_RANDOMIZED_STRING_HASHING collisionCount++;#endif }int index;if (freeCount > 0) { index = freeList; freeList = entries[index].next; freeCount--; }else {if (count == entries.Length) {Resize(); targetBucket = hashCode % buckets.Length; } index = count; count++; } entries[index].hashCode = hashCode; entries[index].next = buckets[targetBucket]; entries[index].key = key; entries[index].value = value; buckets[targetBucket] = index; version++;#if FEATURE_RANDOMIZED_STRING_HASHING#if FEATURE_CORECLR// In case we hit the collision threshold we'll need to switch to the comparer which is using randomized string hashing// in this case will be EqualityComparer<string>.Default.// Note, randomized string hashing is turned on by default on coreclr so EqualityComparer<string>.Default will // be using randomized string hashingif (collisionCount > HashHelpers.HashCollisionThreshold && comparer == NonRandomizedStringEqualityComparer.Default) { comparer = (IEqualityComparer<TKey>) EqualityComparer<string>.Default;Resize(entries.Length, true); }#elseif(collisionCount > HashHelpers.HashCollisionThreshold && HashHelpers.IsWellKnownEqualityComparer(comparer)) { comparer = (IEqualityComparer<TKey>) HashHelpers.GetRandomizedEqualityComparer(comparer);Resize(entries.Length, true); }#endif // FEATURE_CORECLR#endif }

它还调用其他方法:

private void Initialize(int capacity) {int size = HashHelpers.GetPrime(capacity); buckets = new int[size];for (int i = 0; i < buckets.Length; i++) buckets[i] = -1; entries = new Entry[size]; freeList = -1;}private void Resize(int newSize, bool forceNewHashCodes) { Contract.Assert(newSize >= entries.Length);int[] newBuckets = new int[newSize];for (int i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;Entry[] newEntries = new Entry[newSize]; Array.Copy(entries, 0, newEntries, 0, count);if(forceNewHashCodes) {for (int i = 0; i < count; i++) {if(newEntries[i].hashCode != -1) { newEntries[i].hashCode = (comparer.GetHashCode(newEntries[i].key) & 0x7FFFFFFF); } } }for (int i = 0; i < count; i++) {if (newEntries[i].hashCode >= 0) {int bucket = newEntries[i].hashCode % newSize; newEntries[i].next = newBuckets[bucket]; newBuckets[bucket] = i; } } buckets = newBuckets; entries = newEntries;}

最后,它们调用 上的方法。我们将复制这些,因为如果您不处理边缘情况,它们将非常微不足道。我们的版本将如下所示:HashHelpers

public static HashHelpers{public static readonly int[] primes = {3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369};public static int GetPrime(int min) {foreach (var prime in primes) {if (prime >= min) return prime; }return min; }public static int ExpandPrime(int oldSize) {return GetPrime(2 * oldSize); }}

因为我们将在调整大小完成后删除 logic 和 print state 。老实说,代码并不是很重要,它本质上是重新排列和针对新大小。这是我们的结果:Resizeif (forceNewHashCodes)entriesbuckets

private void Resize(int newSize) {var newBuckets = new int[newSize];for (var i = 0; i < newBuckets.Length; i++) newBuckets[i] = -1;var newEntries = new Entry[newSize]; Array.Copy(entries, 0, newEntries, 0, count);for (var i = 0; i < count; i++) {if (newEntries[i].hashCode >= 0) {var bucket = newEntries[i].hashCode % newSize; newEntries[i].next = newBuckets[bucket]; newBuckets[bucket] = i; } } buckets = newBuckets; entries = newEntries;PrintFullState("\u2194\ufe0f Resize");}

我们将初始化逻辑从 straight 移动到构造函数(因为它将更容易处理可空性):if (buckets == ) Initialize(0);

以及初始化完成后的打印状态

public EducationalDictionary(){var size = HashHelpers.GetPrime(0); buckets = new int[size];for (var i = 0; i < buckets.Length; i++) buckets[i] = -1; entries = new Entry[size];PrintFullState("\ud83d\ude80 Initialized");}

以下是我们将执行的更改列表:Add

删除参数验证

删除#FEATURE_RANDOMIZED_STRING_HASHING

删除 logic under,因为我们还是删除了参数if (freeCount > 0)

删除 loop,检查已设置的 key:for (int i = buckets[targetBucket]; i >= 0; i = entries[i].next) {

添加后的打印状态

public void Add(TKey key, TValue value) {var hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;var targetBucket = hashCode % buckets!.Length;if (count == entries!.Length) {Resize(HashHelpers.ExpandPrime(count)); targetBucket = hashCode % buckets.Length; }var index = count; count++; entries[index].hashCode = hashCode; entries[index].next = buckets[targetBucket]; entries[index].key = key; entries[index].value = value; buckets[targetBucket] = index; Console.WriteLine();PrintFullState($"\ud83d\udce5 Add: {key} - {value}. hashCode = {hashCode}, targetBucket = {targetBucket}");}

现在看起来简单多了,对吧?现在让我们完成我们的类,实现 .GetValueOrDefault

获取价值

这一次的实现相当简单,甚至在原始代码中也是如此:GetValueOrDefault

internal TValue GetValueOrDefault(TKey key) {int i = FindEntry(key);if (i >= 0) {return entries[i].value; }return default(TValue);}

在搜索本身(在方法中):FindEntry

private int FindEntry(TKey key) {if( key == ) { ThrowHelper.ThrowArgumentException(ExceptionArgument.key); }if (buckets != ) {int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i; } }return -1;}

这里没有太多需要清理的地方。但是我们将添加大量日志,因为这是我们最重要的逻辑

private int FindEntry(TKey key){var hashCode = comparer.GetHashCode(key!) & 0x7FFFFFFF;var initialBucketIndex = hashCode % buckets.Length; Console.WriteLine(); Console.WriteLine($"\ud83d\udd0e Search. Key = {key}. Initial Bucket Index {initialBucketIndex}");for (var i = buckets[initialBucketIndex]; i >= 0; i = entries[i].next) { Console.WriteLine(); Console.WriteLine($"Comparing key from entries[{i}] ({entries[i]}) to {key}");if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) { Console.WriteLine($"Key is equal returning {i}");return i; }else { Console.WriteLine($"Key is not equal, moving to the next linked index ({entries[i].next})"); } } Console.WriteLine(); Console.WriteLine("Search exit condition met (i >= 0). Returning -1 (as not found)");return -1;}

这样就完成了我们的 .现在让我们使用它并查看我们准备的详细日志!EducationalDictionary

执行测试

我们将添加 4

0 阅读:0

程序员有二十年

简介:感谢大家的关注