13  Collections

The previous chapter introduced Python’s scalar types: individual values like numbers, strings, and booleans. Real-world data, though, comes in groups. A database table is a collection of rows. A CSV file is a collection of records. An order has a collection of line items. To work with grouped data, you need collection types: ways of organizing multiple values together so you can process them as a unit.

This chapter covers Python’s four built-in collection types, each with a distinct purpose. You’ll learn how to create them, access their elements, and modify them. The next chapter will teach you how to make decisions about that data and process it systematically.

13.1 Lists

A list is an ordered, mutable collection of values. You create one with square brackets:

lists.py
products = ["Chai", "Chang", "Aniseed Syrup", "Chef Anton's Cajun Seasoning"]
prices = [18.00, 19.00, 10.00, 22.00]
mixed = [42, "hello", True, None, 3.14]  # Lists can hold any type
empty = []

Unlike arrays in many other languages, Python lists can hold values of different types. In practice, you’ll usually keep lists homogeneous (all the same type) because mixed-type lists are hard to reason about and harder to process.

13.1.1 Indexing

You access individual elements by their position, starting from zero:

indexing.py
products = ["Chai", "Chang", "Aniseed Syrup", "Tofu"]

products[0]    # "Chai"       (first element)
products[1]    # "Chang"      (second element)
products[3]    # "Tofu"       (fourth element)
products[-1]   # "Tofu"       (last element)
products[-2]   # "Aniseed Syrup" (second to last)
WarningZero-based indexing

Python, like most programming languages, counts from zero. The first element is at index 0, the second is at index 1, and so on. If you’re coming from Matlab, where indexing starts at 1, this takes some adjustment. Off-by-one errors are one of the most common bugs in programming, and they often come from forgetting which index a language starts at.

A list with n elements has valid indices from 0 to n - 1.

Negative indices count backward from the end. -1 is the last element, -2 is the second to last, and so on. This is useful when you don’t know the length of a list but need to access elements near the end.

13.1.2 Slicing

A slice extracts a portion of a list. The syntax is list[start:stop], where start is included and stop is excluded:

slicing.py
products = ["Chai", "Chang", "Aniseed Syrup", "Tofu", "Miso"]

products[1:3]   # ["Chang", "Aniseed Syrup"]  (index 1 and 2, not 3)
products[:2]    # ["Chai", "Chang"]            (from the beginning)
products[2:]    # ["Aniseed Syrup", "Tofu", "Miso"]  (to the end)
products[:]     # A copy of the entire list
products[::2]   # ["Chai", "Aniseed Syrup", "Miso"]  (every other element)

The start:stop:step form lets you skip elements. products[::2] takes every second element. products[::-1] reverses the list entirely.

NoteSlices never raise IndexError

If you access an index that doesn’t exist, like products[99], Python raises an IndexError. But slices are forgiving: products[99:200] simply returns an empty list. This makes slicing safe for boundary cases.

13.1.3 Mutability

Lists are mutable, meaning you can change their contents after creation:

list_mutation.py
products = ["Chai", "Chang", "Tofu"]

# Change an element
products[0] = "Earl Grey"       # ["Earl Grey", "Chang", "Tofu"]

# Add elements
products.append("Miso")         # ["Earl Grey", "Chang", "Tofu", "Miso"]
products.insert(1, "Matcha")    # ["Earl Grey", "Matcha", "Chang", "Tofu", "Miso"]

# Remove elements
products.pop()                  # Removes and returns "Miso"
products.remove("Chang")        # Removes the first occurrence of "Chang"

# Extend with another list
products.extend(["Soy Sauce", "Wasabi"])

The distinction between append() and extend() matters. append() adds a single element to the end. extend() adds every element from another iterable:

append_vs_extend.py
a = [1, 2, 3]
a.append([4, 5])    # [1, 2, 3, [4, 5]]  ← a nested list!

b = [1, 2, 3]
b.extend([4, 5])    # [1, 2, 3, 4, 5]    ← flat list

13.1.4 Useful List Operations

A few more operations you’ll use regularly:

list_ops.py
numbers = [3, 1, 4, 1, 5, 9, 2, 6]

len(numbers)            # 8 (number of elements)
sorted(numbers)         # [1, 1, 2, 3, 4, 5, 6, 9] (returns a new list)
numbers.sort()          # Sorts in place, returns None
numbers.reverse()       # Reverses in place
min(numbers)            # 1
max(numbers)            # 9
sum(numbers)            # 31
numbers.count(1)        # 2 (how many times 1 appears)
numbers.index(5)        # 4 (index of first occurrence of 5)

Note the difference between sorted(numbers), which returns a new sorted list and leaves the original unchanged, and numbers.sort(), which sorts the list in place and returns None. This distinction between “returns a new thing” and “modifies in place” is a recurring theme in Python.

13.1.5 Exercises

  1. Given prices = [18.00, 19.00, 10.00, 22.00, 23.25, 6.50], write expressions to get: the first three prices, the last two prices, and every other price starting from the first. What does prices[10] produce? What about prices[10:20]?

  2. Predict the output of this code without running it, then verify:

solution.py
items = [1, 2, 3]
items.append([4, 5])
print(len(items))
print(items[-1])
  1. Given products = ["Chai", "Chang", "Tofu", "Miso", "Chai"], use list methods to: count how many times "Chai" appears, find the index of "Tofu", and remove the first occurrence of "Chai". What does the list look like after the removal?

  2. What is the difference between sorted(numbers) and numbers.sort()? Write code that demonstrates the difference by printing the return value of each.

1. prices[:3][18.0, 19.0, 10.0], prices[-2:][23.25, 6.50], prices[::2][18.0, 10.0, 23.25]. prices[10] raises IndexError. prices[10:20] returns [] (slices never raise IndexError).

2. len(items)4 (not 5, because append added the list [4, 5] as a single element). items[-1][4, 5] (a nested list).

3.

solution.py
products = ["Chai", "Chang", "Tofu", "Miso", "Chai"]
products.count("Chai")  # 2
products.index("Tofu")  # 2
products.remove("Chai")
# products is now ["Chang", "Tofu", "Miso", "Chai"]

4. sorted(numbers) returns a NEW sorted list and leaves the original unchanged. numbers.sort() sorts in place and returns None.

solution.py
numbers = [3, 1, 4, 1, 5]
result_sorted = sorted(numbers)
print(result_sorted)  # [1, 1, 3, 4, 5]
print(numbers)        # [3, 1, 4, 1, 5] (unchanged)

result_sort = numbers.sort()
print(result_sort)    # None
print(numbers)        # [1, 1, 3, 4, 5] (modified in place)

13.2 Tuples and Unpacking

A tuple looks like a list but uses parentheses instead of square brackets, and it’s immutable: once created, you can’t change its contents.

tuples.py
coordinates = (40.4406, -79.9959)  # Pittsburgh, PA
product = ("Chai", 18.00, 39)
single = (42,)                      # Note the trailing comma for one-element tuples

Tuples are useful when you have a fixed collection of related values that shouldn’t change, like a coordinate pair or a database record. You can read elements by index, just like lists:

tuple_access.py
coordinates[0]   # 40.4406
coordinates[1]   # -79.9959

But you can’t modify them:

tuple_immutable.py
coordinates[0] = 41.0  # TypeError: 'tuple' object does not support item assignment

13.2.1 Tuple Unpacking

The most powerful feature of tuples is unpacking: assigning each element to a separate variable in one statement.

unpacking.py
product = ("Chai", 18.00, 39)

name, price, stock = product  # Unpacks into three variables
print(name)    # "Chai"
print(price)   # 18.0
print(stock)   # 39

If you don’t need every value, use _ as a throwaway placeholder:

throwaway.py
name, _, stock = product  # We don't need the price right now

Unpacking works with any iterable, not just tuples. You’ll see it frequently with functions that return multiple values and with loops over dictionaries.

13.3 Dictionaries

A dictionary stores key-value pairs. Each key maps to a value, like a lookup table. You create one with curly braces:

dicts.py
product = {
    "name": "Chai",
    "category": "Beverages",
    "unit_price": 18.00,
    "units_in_stock": 39,
    "discontinued": False,
}

If you’ve used SQL, think of a dictionary as a single row from a table, where the keys are column names and the values are cell values.

13.3.1 Accessing Values

You access values by key, not by position:

dict_access.py
product["name"]          # "Chai"
product["unit_price"]    # 18.0
product["color"]         # KeyError! Key doesn't exist

Accessing a key that doesn’t exist raises a KeyError. To avoid this, use the .get() method, which returns a default value instead:

dict_get.py
product.get("name")              # "Chai"
product.get("color")             # None (default when key is missing)
product.get("color", "Unknown")  # "Unknown" (custom default)

The .get() method is safer than bracket access when you aren’t certain a key exists. It’s especially useful when processing data that might have missing fields.

13.3.2 Modifying Dictionaries

Dictionaries are mutable. You can add, change, and remove key-value pairs:

dict_modify.py
product = {"name": "Chai", "price": 18.00}

# Add a new key
product["category"] = "Beverages"

# Change an existing key
product["price"] = 19.50

# Remove a key
del product["category"]

# Remove and return a value
price = product.pop("price")          # 19.5
missing = product.pop("color", None)  # None (no error if key is missing)

13.3.3 Iterating Over Dictionaries

Dictionaries provide three views of their contents:

dict_iteration.py
product = {"name": "Chai", "price": 18.00, "stock": 39}

product.keys()     # dict_keys(["name", "price", "stock"])
product.values()   # dict_values(["Chai", 18.0, 39])
product.items()    # dict_items([("name", "Chai"), ("price", 18.0), ("stock", 39)])

The .items() method is especially useful because it gives you both keys and values, which you can unpack in a loop:

dict_loop.py
for key, value in product.items():
    print(f"{key}: {value}")
output
name: Chai
price: 18.0
stock: 39

13.3.4 Nested Dictionaries

Dictionaries can contain other dictionaries, creating hierarchical structures that mirror JSON data:

nested_dict.py
order = {
    "order_id": 10248,
    "customer": {
        "id": "VINET",
        "name": "Vins et alcools Chevalier",
        "country": "France",
    },
    "items": [
        {"product": "Quiche", "quantity": 12, "unit_price": 9.20},
        {"product": "Mozzarella", "quantity": 10, "unit_price": 34.80},
    ],
}

order["customer"]["name"]     # "Vins et alcools Chevalier"
order["items"][0]["quantity"]  # 12

This structure mirrors how you’ll see data in the real world. A database query might return rows; each row is a dictionary. Multiple rows form a list of dictionaries. A more complex query might return nested structures like the order example above.

13.3.5 Exercises

  1. Given the Northwind product dictionary below, write expressions to: get the product name, safely get a "color" key that doesn’t exist (returning "N/A"), and add a new key "category" with value "Beverages".
solution.py
product = {"name": "Chai", "price": 18.00, "stock": 39, "discontinued": False}
  1. Given order (the nested dictionary from the chapter), write an expression to get the quantity of the second item in the order. Then write a loop that prints each item’s product name and total cost (quantity × unit_price).

  2. Given two lists of customer countries from different regions:

setup.py
europe = {"France", "Germany", "UK", "Spain"}
americas = {"USA", "Brazil", "Canada", "France"}

Use set operations to find: countries that appear in both regions (there shouldn’t be any in a real dataset, but "France" is in both here), countries unique to Europe, and all countries combined.

  1. Explain why {} creates an empty dictionary, not an empty set. How do you create each?

1.

solution.py
product["name"]                      # "Chai"
product.get("color", "N/A")          # "N/A"
product["category"] = "Beverages"    # Adds the key

2.

solution.py
order["items"][1]["quantity"]  # 10

for item in order["items"]:
    total = item["quantity"] * item["unit_price"]
    print(f"{item['product']}: ${total:.2f}")

3.

solution.py
europe & americas    # {"France"} (intersection)
europe - americas    # {"Germany", "UK", "Spain"} (difference)
europe | americas    # {"France", "Germany", "UK", "Spain", "USA", "Brazil", "Canada"} (union)

4. {} creates an empty dictionary because dictionaries were in Python before sets. Use set() to create an empty set and {} for an empty dictionary.

13.4 Sets

A set is an unordered collection of unique elements:

sets.py
categories = {"Beverages", "Condiments", "Seafood", "Beverages"}
print(categories)  # {"Beverages", "Condiments", "Seafood"} ← duplicate removed

Sets are useful for two things: removing duplicates and performing membership tests efficiently.

set_operations.py
a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7, 8}

a | b    # {1, 2, 3, 4, 5, 6, 7, 8}  Union
a & b    # {4, 5}                      Intersection
a - b    # {1, 2, 3}                   Difference (in a but not in b)
a ^ b    # {1, 2, 3, 6, 7, 8}         Symmetric difference (in one but not both)

If you remember set operations from a math or statistics course, these work exactly the same way. If you’ve used SQL’s UNION, INTERSECT, and EXCEPT, sets are the Python equivalent.

NoteCreating an empty set

{} creates an empty dictionary, not an empty set. To create an empty set, use set():

empty_set.py
empty_dict = {}       # This is a dictionary
empty_set = set()     # This is a set

Summary

Python provides four built-in collection types, each serving a specific purpose. Lists store ordered, mutable sequences you access by position, making them ideal for processing data where order matters. Tuples offer immutable, fixed-length collections for grouping related values, and their unpacking feature lets you elegantly assign multiple values in a single statement. Dictionaries map keys to values, mirroring database records and providing efficient lookup by name rather than position. Sets store unique elements and support mathematical operations like union and intersection, useful for deduplication and membership testing.

These collections are the foundation for data processing. In the next chapter, you’ll learn how to make decisions about data and process collections systematically using control flow: if statements for conditionals and for/while loops for repetition. Together, collections and control flow form the core of Python programming.

Glossary

dictionary
A mutable collection of key-value pairs. Keys must be unique and hashable. Created with {} or dict().
immutable
An object that cannot be changed after creation. Strings, tuples, and frozensets are immutable.
iterable
Any object that can be looped over with a for statement. Lists, tuples, dictionaries, sets, strings, and ranges are all iterable.
key-value pair
A single entry in a dictionary, consisting of a key (used for lookup) and its associated value.
list
An ordered, mutable collection of values. Created with [] or list().
mutable
An object that can be changed after creation. Lists, dictionaries, and sets are mutable.
set
An unordered collection of unique elements. Supports mathematical set operations like union and intersection.
slice
A way to extract a portion of a sequence using start:stop:step notation.
tuple
An ordered, immutable collection of values. Created with () or tuple().
tuple unpacking
Assigning each element of a tuple (or other iterable) to a separate variable in one statement. Example: a, b, c = (1, 2, 3).
zero-based indexing
A numbering convention where the first element is at position 0, the second at position 1, and so on.