Tuesday, June 26, 2012

Exploring C# LinkedLists via LRU Caches (1 of 3)

Downloads for post:
This is the first of three parts of a sequence of explorations I made that started with a focus on caches, and ended with some alternatives to .Net’s LinkedList class. I had the need for a simple LRU (Least Recently Used) Cache having fairly good performance. What I wanted was basically a collection with a fixed capacity, where once its size reaches its maximum capacity, it afterwards makes room for new entries by deleting the least recently accessed entry. I didn’t need entry expiration, so had no need of tracking time of use, just the order that entries were used. A common solution for this is to pair a Dictionary<TKey, TValue> with a Linked List. The keys uniquely identify the values stored in the cache so they can be retrieved. The Linked List maintains order of use of the contents, with the most recently used item at one end of the list, and the least recently used item at the opposite end. Linked List is a good choice, because every time we access an item that’s in the cache, we need to remove that item from its current location in the list, and place it in the most recently used position. Linked Lists are good for low cost deletions and insertions.

I started with a basic implementation that uses .Net’s LinkedList<TValue>. There were two ways I could go with the cache’s field. I could have the the LinkedList store the key and the value pairs the cache contains, and then use the Dictionary to store the a pairing of the key with the associated LinkedListNode from the LinkedList. The only thing the cache really needs to store in the LinkedList is the key so that when the least recently used item is discarded from the LinkedList, I can use the key to delete the associated entry in the Dictionary as well. The second choice is to just store keys in the LinkedList, and store the key paired with a tuple of value and LinkedListNode in the Dictionary. I chose the later because it only puts the pieces in the collections where they’re needed. The following minimal implementation shows the basic operation of the class.
// Copyright ©2012 SoftWx, Inc.
// Released under the MIT License.
using System;
using System.Collections.Generic;
namespace SoftWx.Lib {

    /// <summary>
    /// Simple implementation of an LRU Cache (Least Recently Used). This uses
    /// LinkedList for it's use-ordered list.
    /// </summary>
    /// <remarks> LruCache is conservative with regard to creating linked list node object.
    /// Once the cache has filled, it re-uses the oldest node object to hold the incoming
    /// newest object. So over its entire lifetime, it will not create more than Capacity
    /// node objects.
    /// </remarks>
    public sealed class LruCacheOrig<TKey, TValue> {
        private object lockObj = new object();
        private int capacity;
        private Dictionary<TKey, Entry> cacheMap;
        private LinkedList<TKey> cacheList;

        private LruCacheOrig() { }

        /// <summary>
        /// Create a new instance of LruCache with the specified capacity.
        /// </summary>
        /// <param name="capacity">Maximum capacity of the cache.</param>
        public LruCacheOrig(int capacity) {
            this.capacity = capacity;
            this.cacheMap = new Dictionary<TKey, Entry>(capacity);
            this.cacheList = new LinkedList<TKey>();
        }

        /// <summary>Gets the maximum capacity of this LruCache.</summary>
        public int Capacity { get { return this.capacity; } }

        /// <summary>Gets the count of items contained in this LruCache.</summary>
        public int Count { get { return this.cacheMap.Count; } }

        /// <summary>Clear the contents of the LruCache.</summary>
        public void Clear() {
            lock (lockObj) {
                this.cacheList.Clear();
                // break all links between nodes (may help garbage collection)
                this.cacheMap.Clear();
            }
        }

        /// <summary>Try and get the cached value with the specified key.</summary>
        /// <param name="key">The key used to identify the cached item.</param>
        /// <param name="value">When this method returns, contains the value
        /// associated with the specified key, or the default value if the key
        /// is not in the cache.</param>
        /// <returns>True if the value associated with the specified key was found,
        /// otherwise false.</returns>
        public bool TryGetValue(TKey key, out TValue value) {
            lock (lockObj) {
                Entry entry;
                if (this.cacheMap.TryGetValue(key, out entry)) {
                    Touch(entry.node);
                    value = entry.value;
                    return true;
                }
            }
            value = default(TValue);
            return false;
        }

        /// <summary>
        /// Determines whether the LruCache containse the specified key.
        /// </summary>
        /// <param name="key">The key to locate in the LruCache.</param>
        /// <returns></returns>
        public bool ContainsKey(TKey key) {
            lock (lockObj) {
                return this.cacheMap.ContainsKey(key);
            }
        }
        
        /// <summary>
        /// Adds the specified value to the cache, associated with
        /// the specified key. If a value is in the cache already
        /// with that key, then the value in the cache is replaced 
        /// with the specified value.
        /// </summary>
        /// <param name="key">The key used to identify the cached item.</param>
        /// <param name="value">The value to be stored in the cache.</param>
        public void Add(TKey key, TValue value) {
            lock (lockObj) {
                Entry entry;
                if (!this.cacheMap.TryGetValue(key, out entry)) {
                    LinkedListNode<TKey> node;
                    if (this.cacheMap.Count == this.capacity) {
                        // cache full, so re-use the oldest node
                        node = this.cacheList.First;
                        this.cacheMap.Remove(node.Value);
                        this.cacheList.RemoveFirst();
                        node.Value = key;
                    } else {
                        // add newest
                        node = new LinkedListNode<TKey>(key);
                    }
                    this.cacheList.AddLast(node);
                    // map the key to the node and value
                    this.cacheMap.Add(key, new Entry(node, value));
                } else {
                    // key exists, so modify value and make node the newest
                    entry.value = value;
                    this.cacheMap[key] = entry; 
                    Touch(entry.node);
                }
            }
        }

        /// <summary>
        /// Make the cache item with the specified node be the most recently used item.
        /// Must be called from code that is in a lock block.
        /// </summary>
        /// <param name="node">The cached item LruNode.</param>
        private void Touch(LinkedListNode<TKey> node) {
            if (node != this.cacheList.Last) {
                // node is not already the newest
                this.cacheList.Remove(node);
                this.cacheList.AddLast(node);
            }
        }
        private struct Entry {
            public LinkedListNode<TKey> node;
            public TValue value;
            public Entry(LinkedListNode<TKey> node, TValue value) {
                this.node = node;
                this.value = value;
            }
        }
    }
}
For the tuple stored in the Dictionary, I chose to declare a simple private struct called Entry. This helps the code read better than if I had used something like KeyValuePair, and avoids the object creations that would come with using the Tuple class.

The only optimization I made, other than just trying to avoid unnecessary code, was some extra logic in the Add method. Once the cache reaches its capacity, any further adds require that the oldest item be removed to make room for the new one. In this case, I re-purpose the oldest LinkedListNode to hold the new data being added rather than discard the oldest node and create a new one to insert in the linked list. This ensures that the cache will create at most the number of LinkedListNodes equal to the capacity of the cache over the entire lifetime of the cache (assuming Clear() isn't called). For caches with large capacity (>1,000,000) this can nearly cut in half the time needed to add a new item to the cache, and is a little bit friendlier to the garbage collector.

LinkedListCache Performance
Performance of LruCache that uses LinkedList
There are four significant operations with this sort of cache: Add an item into a cache with spare capacity, add an item into a cache that’s at capacity (resulting in the oldest item being removed first), successfully getting an item  from the cache, and trying to get an item from the cache but failing because it isn’t present. Here is a graph of the performance over different cache capacities, measured in time (in milliseconds) required to perform 1 million operations against the cache. Tests were run on a run of the mill laptop.


In the next post, I’ll make a modified version of this class that uses a private implementation of a linked list.