Python Tricks - Common Data Structures in Python(1)

Dictionaries, Maps, and Hashtables

In Python, dictionaries (or “dicts” for short) are a central data structure. Dicts store an arbitrary number of objects, each identified by a unique dictionary key.

Dictionaries are also often called maps, hashmaps, lookup tables, or associative arrays. They allow for the efficient lookup, insertion, and deletion of any object associated with a given key.

What does this mean in practice? It turns out that phone books make a decent real-world analog for dictionary objects:

Phone books allow you to quickly retrieve the information (phone number) associated with a given key (a person’s name). So, instead of having to read a phone book front to back in order to find someone’s number, you can jump more or less directly to a name and look up the associated information.


This analogy breaks down somewhat when it comes to how the information is organized in order to allow for fast lookups. But the fundamental performance characteristics hold: Dictionaries allow you to quickly find the information associated with a given key.

In summary, dictionaries are one of the most frequently used and most important data structures in computer science.

So, how does Python handle dictionaries?

Let’s take a tour of the dictionary implementations available in core
Python and the Python standard library.

dict – Your Go-To Dictionary

Because of their importance, Python features a robust dictionary
implementation that’s built directly into the core language: the dict data type.

Python also provides some useful “syntactic sugar” for working with dictionaries in your programs. For example, the curly-braces dictionary expression syntax and dictionary comprehensions allow you to conveniently define new dictionary objects:

phonebook = {
  'bob': 7387,
  'alice': 3719,
  'jack': 7052,

squares = {x: x * x for x in range(6)}

>>> phonebook['alice']

>>> squares
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}


There are some restrictions on which objects can be used as valid keys.

Python’s dictionaries are indexed by keys that can be of any hashable type: A hashable object has a hash value which never changes during its lifetime (see __hash__), and it can be compared to other objects (see __eq__). In addition, hashable objects which compare as equal must have the same hash value.

Immutable types like strings and numbers are hashable and work well as dictionary keys. You can also use tuple objects as dictionary keys, as long as they contain only hashable types themselves.

For most use cases, Python’s built-in dictionary implementation will do everything you need. Dictionaries are highly optimized and underlie many parts of the language, for example class attributes and variables in a stack frame are both stored internally in dictionaries.

Python dictionaries are based on a well-tested and finely tuned hash
table implementation that provides the performance characteristics
you’d expect: O(1) time complexity for lookup, insert, update, and
delete operations in the average case.

There’s little reason not to use the standard dict implementation included with Python. However, specialized third-party dictionary implementations exist, for example skip lists or B-tree based dictionaries.

Besides “plain” dict objects, Python’s standard library also includes a
number of specialized dictionary implementations. These specialized
dictionaries are all based on the built-in dictionary class (and share its performance characteristics), but add some convenience features
on top of that.

Let’s take a look at them.

collections.OrderedDict – Remember the Insertion Order of Keys

Python includes a specialized dict subclass that remembers the insertion
order of keys added to it: collections.OrderedDict.

While standard dict instances preserve the insertion order of keys in CPython 3.6 and above, this is just a side effect of the CPython implementation and is not defined in the language spec. So, if key order is important for your algorithm to work, it’s best to communicate this clearly by explicitly using the OrderDict class.
虽然标准dict实例保留了cpython 3.6及更高版本中键的插入顺序,但这只是cpython实现的副作用,在语言规范中没有定义。因此,如果键顺序对于算法的工作很重要,最好通过显式使用orderdict类来清楚地传达这一点。

By the way, OrderedDict is not a built-in part of the core language and must be imported from the collections module in the standard library.

>>> import collections
>>> d = collections.OrderedDict(one=1, two=2, three=3)

>>> d
OrderedDict([('one', 1), ('two', 2), ('three', 3)])

>>> d['four'] = 4
>>> d
OrderedDict([('one', 1), ('two', 2),
('three', 3), ('four', 4)])

>>> d.keys()
odict_keys(['one', 'two', 'three', 'four'])
collections.defaultdict – Return Default Values for Missing Keys

The defaultdict class is another dictionary subclass that accepts a callable in its constructor whose return value will be used if a requested key cannot be found.

This can save you some typing and make the programmer’s intentions more clear, as compared to using the get() methods or catching a KeyError exception in regular dictionaries.

>>> from collections import defaultdict
>>> dd = defaultdict(list)

# Accessing a missing key creates it and
# initializes it using the default factory,
# i.e. list() in this example:
>>> dd['dogs'].append('Rufus')
>>> dd['dogs'].append('Kathrin')
>>> dd['dogs'].append('Mr Sniffles')
>>> dd['dogs']
['Rufus', 'Kathrin', 'Mr Sniffles']
collections.ChainMap – Search Multiple Dictionaries as a Single Mapping

The collections.ChainMap data structure groups multiple dictionaries into a single mapping. Lookups search the underlying mappings one by one until a key is found. Insertions, updates, and deletions only affect the first mapping added to the chain.

>>> from collections import ChainMap
>>> dict1 = {'one': 1, 'two': 2}
>>> dict2 = {'three': 3, 'four': 4}
>>> chain = ChainMap(dict1, dict2)

>>> chain
ChainMap({'one': 1, 'two': 2}, {'three': 3, 'four': 4})
# ChainMap searches each collection in the chain
# from left to right until it finds the key (or fails):
>>> chain['three']
>>> chain['one']
>>> chain['missing']
KeyError: 'missing'
types.MappingProxyType – A Wrapper for Making Read-Only Dictionaries

MappingProxyType is a wrapper around a standard dictionary that provides a read-only view into the wrapped dictionary’s data. This class was added in Python 3.3, and it can be used to create immutable proxy versions of dictionaries.

For example, this can be helpful if you’d like to return a dictionary carrying internal state from a class or module, while discouraging write access to this object. Using MappingProxyType allows you to put these restrictions in place without first having to create a full copy of the dictionary.

>>> from types import MappingProxyType
>>> writable = {'one': 1, 'two': 2}
>>> read_only = MappingProxyType(writable)

# The proxy is read-only:
>>> read_only['one']
>>> read_only['one'] = 23
"'mappingproxy' object does not support item assignment"

# Updates to the original are reflected in the proxy:
>>> writable['one'] = 42
>>> read_only
mappingproxy({'one': 42, 'two': 2})


Dictionaries in Python: Conclusion

All of the Python dictionary implementations listed in this chapter are valid implementations that are built into the Python standard library.

If you’re looking for a general recommendation on which mapping type to use in your programs, I’d point you to the built-in dict data type. It’s a versatile and optimized hash table implementation that’s built directly into the core language.

I would only recommend that you use one of the other data types listed here if you have special requirements that go beyond what’s provided by dict.

Yes, I still believe all of them are valid options—but usually your code will be more clear and easier to maintain by other developers if it relies on standard Python dictionaries most of the time.

Key Takeaways
  • Dictionaries are the central data structure in Python.
  • The built-in dict type will be “good enough” most of the time.
  • Specialized implementations, like read-only or ordered dicts, are available in the Python standard library.

