Choosing Collections in .NET Framework
(In progress...)
Big O Notation
http://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly
.
The Generic Collections: System.Collections.Generic namespace
Associative Collection Classes (stores a value by a key, associates the value with the key)
useful to lookup/manipulate a collection using a key value
The Dictionary<TKey,TValue>
Best for high performance lookups.
is the fastest class for associative lookups/inserts/deletes because it uses a hash table under the covers
The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.
the key type should correctly implement GetHashCode() and Equals() appropriately
or you should provide an external IEqualityComparer to the dictionary on construction.
O(1)
insert/delete/lookup time of items in the dictionary is amortized constant time - O(1)
- which means no matter how big the dictionary gets, the time it takes to find something remains relatively constant
This is highly desirable for high-speed lookups
downside is that the dictionary, by nature of using a hash table, is unordered, so you cannot easily traverse the items in a Dictionary in order
The SortedDictionary<TKey,TValue>
Compromise of Dictionary speed and ordering, uses binary search tree.
is similar to the Dictionary<TKey,TValue> in usage
differents in implementation
gives fast lookups (slover then Dictionary<TKey,TValue>) but also the ability to maintain the collection in order by the key
O(1) == O(log 10)
O(log 10) < O(log 11)
O(log n)
requires a little bit of lookup time for the ability to maintain the items in order, thus insert/delete/lookup times in a sorted dictionary are logarithmic - O(log n).
has a cost of maintaining a constant sorting
uses a binary tree under the covers to maintain the items in order by the key
As a consequence of sorting, the type used for the key must correctly implement IComparable<TKey> so that the keys can be correctly sorted.
The SortedList<TKey,TValue>
Very similar to SortedDictionary, except tree is implemented in an array, so has faster lookup on preloaded data, but slower loads.
is implemented as an array of key/value pairs, sorted by the key.
uses less memory than SortedDictionary<TKey, TValue>
SortedDictionary<TKey, TValue> has faster insertion and removal operations for unsorted data - O(n)
Lookup - O(log n)
supports efficient indexed retrieval of keys and values through the collections returned by the Keys and Values properties.
Key objects must be immutable as long as they are used as keys in the SortedList<TKey, TValue>.
Has capacity management.
The Non-Associative Containers
The List<T>
Best for smaller lists where direct access required and no sorting.
items are stored contiguously as an array, you can access items in the List<T> by index very quickly
inserting and removing in the beginning or middle of the List<T> are very costly
because you must shift all the items up or down as you delete or insert respectively.
adding and removing at the end of a List<T> is an amortized constant operation - O(1)
typically we favor a List<T> even over arrays unless we are sure the size will remain absolutely fixed.
The LinkedList<T>
Best for lists where inserting/deleting in middle is common and no direct access required.
is a basic implementation of a doubly-linked list.
By this you can add or remove items in the middle of a linked list very quickly
(because there's no items to move up or down in contiguous memory)
It lost the ability to index items by position quickly
We favor List<T> over LinkedList<T> unless you are doing a lot of adding and removing.
The HashSet<T>
Unique unordered collection, like a Dictionary except key and value are same object.
is an unordered collection of unique items.
Logically, this is very similar to having a Dictionary<TKey,TValue> where the TKey and TValue both refer to the same object.
is useful for super-quick lookups
This collection is very useful for maintaining a collection of items you wish to check membership against.
like in Dictionary, the type T should have a valid implementation of GetHashCode() and Equals(),
or you should provide an appropriate IEqualityComparer<T> to the HashSet<T> on construction.
The SortedSet<T>
Unique sorted collection, like SortedDictionary except key and value are same object.
is a binary tree where the key and value are the same object.
The SortedSet<T> is to HashSet<T> what the SortedDictionary<TKey,TValue> is to Dictionary<TKey,TValue>.
adding/removing/lookups are logarithmic - O(log n)
the ability to iterate over the items in order.
!!! to be effective, type T must implement IComparable<T> or you need to supply an external IComparer<T>.
the Stack<T> and Queue<T>
Essentially same as List<T>
sequential collections
Stack<T> - last-in-first-out (LIFO)
Queue<T> - first-in-first-out (FIFO)
The summarize table at the
http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
The Original Collections: System.Collections namespace
considered deprecated by developers and by Microsoft itself.
mostly used when you are dealing with legacy .NET code
in some cases less performant than their generic counterparts.
ArrayList
A dynamic, contiguous collection of objects.
use the List<T> instead.
Hashtable
Associative, unordered collection of key-value pairs of objects.
use the Dictionary<TKey,TValue> instead.
SortedList
Associative, ordered collection of key-value pairs of objects.
use the SortedList<T> instead.
Queue
Stack
The Concurrent Collections: System.Collections.Concurrent namespace
They ignore the ICollection's IsSynchronized (always return false) and the SyncRoot (always returns null)
all the concurrent containers are safe for enumeration even while being modified
The main collections:
ConcurrentDictionary
Optimized for multiple readers (allows multiple readers under same lock).
ConcurrentBag
unordered collection of objects.
Optimized for situations where a thread may be bother reader and writer.
BlockingCollection
Wrapper collection that implement producers & consumers paradigm.
Readers can block until items are available to read.
Writers can block until space is available to write (if bounded).
ConcurrentStack
achieves thread-safe access by using System.Threading.Interlocked operations.
This means that the multi-threaded access to the stack requires no traditional locking and is very, very fast!
Interesting scenario: We may iterate collection in the foreach and change it on the fly - see the GetEnumerator() below.
Pop() was removed by TryPop()
PushRange() and TryPopRange() were added
Allows to deal with multiple items atomically.
Push() works just like you’d expect
Count takes a snapshot of the stack and then counts the items.
it is a O(n) operation, and if you just want to check for an empty stack, call IsEmpty instead which is O(1).
ToArray() and GetEnumerator() both also take snapshots.
iteration over a stack will give you a static view at the time of the call and will not reflect updates.
Peek() method has been removed in favor of a TryPeek()
ConcurrentQueue
Most notably changes:
Dequeue() was removed by TryDequeue()
Returns true if an item existed and was dequeued and false if empty.
Count does not take a snapshot
subtracts the head and tail index to get the count.
This results overall in a O(1) complexity which is quite good.
It’s still recommended, however, that for empty checks you call IsEmpty instead of comparing Count to zero.
ToArray() and GetEnumerator() both take snapshots.
Peek() method has been removed in favor of a TryPeek()
From:
http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
MSDN
http://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly
Big O Notation
http://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly
.
The Generic Collections: System.Collections.Generic namespace
Associative Collection Classes (stores a value by a key, associates the value with the key)
useful to lookup/manipulate a collection using a key value
The Dictionary<TKey,TValue>
Best for high performance lookups.
is the fastest class for associative lookups/inserts/deletes because it uses a hash table under the covers
The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.
the key type should correctly implement GetHashCode() and Equals() appropriately
or you should provide an external IEqualityComparer to the dictionary on construction.
O(1)
insert/delete/lookup time of items in the dictionary is amortized constant time - O(1)
- which means no matter how big the dictionary gets, the time it takes to find something remains relatively constant
This is highly desirable for high-speed lookups
downside is that the dictionary, by nature of using a hash table, is unordered, so you cannot easily traverse the items in a Dictionary in order
The SortedDictionary<TKey,TValue>
Compromise of Dictionary speed and ordering, uses binary search tree.
is similar to the Dictionary<TKey,TValue> in usage
differents in implementation
gives fast lookups (slover then Dictionary<TKey,TValue>) but also the ability to maintain the collection in order by the key
O(1) == O(log 10)
O(log 10) < O(log 11)
O(log n)
requires a little bit of lookup time for the ability to maintain the items in order, thus insert/delete/lookup times in a sorted dictionary are logarithmic - O(log n).
has a cost of maintaining a constant sorting
uses a binary tree under the covers to maintain the items in order by the key
As a consequence of sorting, the type used for the key must correctly implement IComparable<TKey> so that the keys can be correctly sorted.
The SortedList<TKey,TValue>
Very similar to SortedDictionary, except tree is implemented in an array, so has faster lookup on preloaded data, but slower loads.
is implemented as an array of key/value pairs, sorted by the key.
uses less memory than SortedDictionary<TKey, TValue>
SortedDictionary<TKey, TValue> has faster insertion and removal operations for unsorted data - O(n)
Lookup - O(log n)
supports efficient indexed retrieval of keys and values through the collections returned by the Keys and Values properties.
Key objects must be immutable as long as they are used as keys in the SortedList<TKey, TValue>.
Has capacity management.
The Non-Associative Containers
The List<T>
Best for smaller lists where direct access required and no sorting.
items are stored contiguously as an array, you can access items in the List<T> by index very quickly
inserting and removing in the beginning or middle of the List<T> are very costly
because you must shift all the items up or down as you delete or insert respectively.
adding and removing at the end of a List<T> is an amortized constant operation - O(1)
typically we favor a List<T> even over arrays unless we are sure the size will remain absolutely fixed.
The LinkedList<T>
Best for lists where inserting/deleting in middle is common and no direct access required.
is a basic implementation of a doubly-linked list.
By this you can add or remove items in the middle of a linked list very quickly
(because there's no items to move up or down in contiguous memory)
It lost the ability to index items by position quickly
We favor List<T> over LinkedList<T> unless you are doing a lot of adding and removing.
The HashSet<T>
Unique unordered collection, like a Dictionary except key and value are same object.
is an unordered collection of unique items.
Logically, this is very similar to having a Dictionary<TKey,TValue> where the TKey and TValue both refer to the same object.
is useful for super-quick lookups
This collection is very useful for maintaining a collection of items you wish to check membership against.
like in Dictionary, the type T should have a valid implementation of GetHashCode() and Equals(),
or you should provide an appropriate IEqualityComparer<T> to the HashSet<T> on construction.
The SortedSet<T>
Unique sorted collection, like SortedDictionary except key and value are same object.
is a binary tree where the key and value are the same object.
The SortedSet<T> is to HashSet<T> what the SortedDictionary<TKey,TValue> is to Dictionary<TKey,TValue>.
adding/removing/lookups are logarithmic - O(log n)
the ability to iterate over the items in order.
!!! to be effective, type T must implement IComparable<T> or you need to supply an external IComparer<T>.
the Stack<T> and Queue<T>
Essentially same as List<T>
sequential collections
Stack<T> - last-in-first-out (LIFO)
Queue<T> - first-in-first-out (FIFO)
The summarize table at the
http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
The Original Collections: System.Collections namespace
considered deprecated by developers and by Microsoft itself.
mostly used when you are dealing with legacy .NET code
in some cases less performant than their generic counterparts.
ArrayList
A dynamic, contiguous collection of objects.
use the List<T> instead.
Hashtable
Associative, unordered collection of key-value pairs of objects.
use the Dictionary<TKey,TValue> instead.
SortedList
Associative, ordered collection of key-value pairs of objects.
use the SortedList<T> instead.
Queue
Stack
The Concurrent Collections: System.Collections.Concurrent namespace
They ignore the ICollection's IsSynchronized (always return false) and the SyncRoot (always returns null)
all the concurrent containers are safe for enumeration even while being modified
The main collections:
ConcurrentDictionary
Optimized for multiple readers (allows multiple readers under same lock).
ConcurrentBag
unordered collection of objects.
Optimized for situations where a thread may be bother reader and writer.
BlockingCollection
Wrapper collection that implement producers & consumers paradigm.
Readers can block until items are available to read.
Writers can block until space is available to write (if bounded).
ConcurrentStack
achieves thread-safe access by using System.Threading.Interlocked operations.
This means that the multi-threaded access to the stack requires no traditional locking and is very, very fast!
Interesting scenario: We may iterate collection in the foreach and change it on the fly - see the GetEnumerator() below.
Pop() was removed by TryPop()
PushRange() and TryPopRange() were added
Allows to deal with multiple items atomically.
Push() works just like you’d expect
Count takes a snapshot of the stack and then counts the items.
it is a O(n) operation, and if you just want to check for an empty stack, call IsEmpty instead which is O(1).
ToArray() and GetEnumerator() both also take snapshots.
iteration over a stack will give you a static view at the time of the call and will not reflect updates.
Peek() method has been removed in favor of a TryPeek()
ConcurrentQueue
Most notably changes:
Dequeue() was removed by TryDequeue()
Returns true if an item existed and was dequeued and false if empty.
Count does not take a snapshot
subtracts the head and tail index to get the count.
This results overall in a O(1) complexity which is quite good.
It’s still recommended, however, that for empty checks you call IsEmpty instead of comparing Count to zero.
ToArray() and GetEnumerator() both take snapshots.
Peek() method has been removed in favor of a TryPeek()
From:
http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
MSDN
http://stackoverflow.com/questions/2307283/what-does-olog-n-mean-exactly
Comments
Post a Comment