13 Collections

The previous chapter introduced Python’s scalar types: individual values like numbers, strings, and booleans. Real-world data, though, comes in groups. A database table is a collection of rows. A CSV file is a collection of records. An order has a collection of line items. To work with grouped data, you need collection types: ways of organizing multiple values together so you can process them as a unit.

This chapter covers Python’s four built-in collection types, each with a distinct purpose. You’ll learn how to create them, access their elements, and modify them. The next chapter will teach you how to make decisions about that data and process it systematically.

13.1 Lists

A list is an ordered, mutable collection of values. You create one with square brackets:

lists.py

products = ["Chai", "Chang", "Aniseed Syrup", "Chef Anton's Cajun Seasoning"]
prices = [18.00, 19.00, 10.00, 22.00]
mixed = [42, "hello", True, None, 3.14]  # Lists can hold any type
empty = []

Unlike arrays in many other languages, Python lists can hold values of different types. In practice, you’ll usually keep lists homogeneous (all the same type) because mixed-type lists are hard to reason about and harder to process.

13.1.1 Indexing

You access individual elements by their position, starting from zero:

indexing.py

products = ["Chai", "Chang", "Aniseed Syrup", "Tofu"]

products[0]    # "Chai"       (first element)
products[1]    # "Chang"      (second element)
products[3]    # "Tofu"       (fourth element)
products[-1]   # "Tofu"       (last element)
products[-2]   # "Aniseed Syrup" (second to last)

Zero-based indexing

Python, like most programming languages, counts from zero. The first element is at index 0, the second is at index 1, and so on. If you’re coming from Matlab, where indexing starts at 1, this takes some adjustment. Off-by-one errors are one of the most common bugs in programming, and they often come from forgetting which index a language starts at.

A list with n elements has valid indices from 0 to n - 1.

Negative indices count backward from the end. -1 is the last element, -2 is the second to last, and so on. This is useful when you don’t know the length of a list but need to access elements near the end.

13.1.2 Slicing

A slice extracts a portion of a list. The syntax is list[start:stop], where start is included and stop is excluded:

slicing.py

products = ["Chai", "Chang", "Aniseed Syrup", "Tofu", "Miso"]

products[1:3]   # ["Chang", "Aniseed Syrup"]  (index 1 and 2, not 3)
products[:2]    # ["Chai", "Chang"]            (from the beginning)
products[2:]    # ["Aniseed Syrup", "Tofu", "Miso"]  (to the end)
products[:]     # A copy of the entire list
products[::2]   # ["Chai", "Aniseed Syrup", "Miso"]  (every other element)

The start:stop:step form lets you skip elements. products[::2] takes every second element. products[::-1] reverses the list entirely.

Slices never raise IndexError

If you access an index that doesn’t exist, like products[99], Python raises an IndexError. But slices are forgiving: products[99:200] simply returns an empty list. This makes slicing safe for boundary cases.

13.1.3 Mutability

Lists are mutable, meaning you can change their contents after creation:

list_mutation.py

products = ["Chai", "Chang", "Tofu"]

# Change an element
products[0] = "Earl Grey"       # ["Earl Grey", "Chang", "Tofu"]

# Add elements
products.append("Miso")         # ["Earl Grey", "Chang", "Tofu", "Miso"]
products.insert(1, "Matcha")    # ["Earl Grey", "Matcha", "Chang", "Tofu", "Miso"]

# Remove elements
products.pop()                  # Removes and returns "Miso"
products.remove("Chang")        # Removes the first occurrence of "Chang"

# Extend with another list
products.extend(["Soy Sauce", "Wasabi"])

The distinction between append() and extend() matters. append() adds a single element to the end. extend() adds every element from another iterable:

append_vs_extend.py

a = [1, 2, 3]
a.append([4, 5])    # [1, 2, 3, [4, 5]]  ← a nested list!

b = [1, 2, 3]
b.extend([4, 5])    # [1, 2, 3, 4, 5]    ← flat list

13.1.4 Useful List Operations

A few more operations you’ll use regularly:

list_ops.py

numbers = [3, 1, 4, 1, 5, 9, 2, 6]

len(numbers)            # 8 (number of elements)
sorted(numbers)         # [1, 1, 2, 3, 4, 5, 6, 9] (returns a new list)
numbers.sort()          # Sorts in place, returns None
numbers.reverse()       # Reverses in place
min(numbers)            # 1
max(numbers)            # 9
sum(numbers)            # 31
numbers.count(1)        # 2 (how many times 1 appears)
numbers.index(5)        # 4 (index of first occurrence of 5)

Note the difference between sorted(numbers), which returns a new sorted list and leaves the original unchanged, and numbers.sort(), which sorts the list in place and returns None. This distinction between “returns a new thing” and “modifies in place” is a recurring theme in Python.

13.1.5 Exercises

Given prices = [18.00, 19.00, 10.00, 22.00, 23.25, 6.50], write expressions to get: the first three prices, the last two prices, and every other price starting from the first. What does prices[10] produce? What about prices[10:20]?
Predict the output of this code without running it, then verify:

setup.py

items = [1, 2, 3]
items.append([4, 5])
print(len(items))
print(items[-1])

Given products = ["Chai", "Chang", "Tofu", "Miso", "Chai"], use list methods to: count how many times "Chai" appears, find the index of "Tofu", and remove the first occurrence of "Chai". What does the list look like after the removal?
What is the difference between sorted(numbers) and numbers.sort()? Write code that demonstrates the difference by printing the return value of each.

Solutions

1. prices[:3] → [18.0, 19.0, 10.0], prices[-2:] → [23.25, 6.50], prices[::2] → [18.0, 10.0, 23.25]. prices[10] raises IndexError. prices[10:20] returns [] (slices never raise IndexError).

2. len(items) → 4 (not 5, because append added the list [4, 5] as a single element). items[-1] → [4, 5] (a nested list).

solution.py

products = ["Chai", "Chang", "Tofu", "Miso", "Chai"]
products.count("Chai")  # 2
products.index("Tofu")  # 2
products.remove("Chai")
# products is now ["Chang", "Tofu", "Miso", "Chai"]

4. sorted(numbers) returns a NEW sorted list and leaves the original unchanged. numbers.sort() sorts in place and returns None.

solution.py

numbers = [3, 1, 4, 1, 5]
result_sorted = sorted(numbers)
print(result_sorted)  # [1, 1, 3, 4, 5]
print(numbers)        # [3, 1, 4, 1, 5] (unchanged)

result_sort = numbers.sort()
print(result_sort)    # None
print(numbers)        # [1, 1, 3, 4, 5] (modified in place)

13.2 Tuples and Unpacking

A tuple looks like a list but uses parentheses instead of square brackets, and it’s immutable: once created, you can’t change its contents.

tuples.py

coordinates = (40.4406, -79.9959)  # Pittsburgh, PA
product = ("Chai", 18.00, 39)
single = (42,)                      # Note the trailing comma for one-element tuples

Tuples are useful when you have a fixed collection of related values that shouldn’t change, like a coordinate pair or a database record. You can read elements by index, just like lists:

tuple_access.py

coordinates[0]   # 40.4406
coordinates[1]   # -79.9959

But you can’t modify them:

tuple_immutable.py

coordinates[0] = 41.0  # TypeError: 'tuple' object does not support item assignment

13.2.1 Tuple Unpacking

The most powerful feature of tuples is unpacking: assigning each element to a separate variable in one statement.

unpacking.py

product = ("Chai", 18.00, 39)

name, price, stock = product  # Unpacks into three variables
print(name)    # "Chai"
print(price)   # 18.0
print(stock)   # 39

If you don’t need every value, use _ as a throwaway placeholder:

throwaway.py

name, _, stock = product  # We don't need the price right now

Unpacking works with any iterable, not just tuples. You’ll see it frequently with functions that return multiple values and with loops over dictionaries.

13.3 Dictionaries

A dictionary stores key-value pairs. Each key maps to a value, like a lookup table. You create one with curly braces:

dicts.py

product = {
    "name": "Chai",
    "category": "Beverages",
    "unit_price": 18.00,
    "units_in_stock": 39,
    "discontinued": False,
}

If you’ve used SQL, think of a dictionary as a single row from a table, where the keys are column names and the values are cell values.

13.3.1 Accessing Values

You access values by key, not by position:

dict_access.py

product["name"]          # "Chai"
product["unit_price"]    # 18.0
product["color"]         # KeyError! Key doesn't exist

Accessing a key that doesn’t exist raises a KeyError. To avoid this, use the .get() method, which returns a default value instead:

dict_get.py

product.get("name")              # "Chai"
product.get("color")             # None (default when key is missing)
product.get("color", "Unknown")  # "Unknown" (custom default)

The .get() method is safer than bracket access when you aren’t certain a key exists. It’s especially useful when processing data that might have missing fields.

13.3.2 Modifying Dictionaries

Dictionaries are mutable. You can add, change, and remove key-value pairs:

dict_modify.py

product = {"name": "Chai", "price": 18.00}

# Add a new key
product["category"] = "Beverages"

# Change an existing key
product["price"] = 19.50

# Remove a key
del product["category"]

# Remove and return a value
price = product.pop("price")          # 19.5
missing = product.pop("color", None)  # None (no error if key is missing)

13.3.3 Iterating Over Dictionaries

Dictionaries provide three views of their contents:

dict_iteration.py

product = {"name": "Chai", "price": 18.00, "stock": 39}

product.keys()     # dict_keys(["name", "price", "stock"])
product.values()   # dict_values(["Chai", 18.0, 39])
product.items()    # dict_items([("name", "Chai"), ("price", 18.0), ("stock", 39)])

The .items() method is especially useful because it gives you both keys and values, which you can unpack in a loop:

dict_loop.py

for key, value in product.items():
    print(f"{key}: {value}")

output

name: Chai
price: 18.0
stock: 39

13.3.4 Nested Dictionaries

Dictionaries can contain other dictionaries, creating hierarchical structures that mirror JSON data:

nested_dict.py

order = {
    "order_id": 10248,
    "customer": {
        "id": "VINET",
        "name": "Vins et alcools Chevalier",
        "country": "France",
    },
    "items": [
        {"product": "Quiche", "quantity": 12, "unit_price": 9.20},
        {"product": "Mozzarella", "quantity": 10, "unit_price": 34.80},
    ],
}

order["customer"]["name"]     # "Vins et alcools Chevalier"
order["items"][0]["quantity"]  # 12

This structure mirrors how you’ll see data in the real world. A database query might return rows; each row is a dictionary. Multiple rows form a list of dictionaries. A more complex query might return nested structures like the order example above.

13.3.5 Exercises

Given the Northwind product dictionary below, write expressions to: get the product name, safely get a "color" key that doesn’t exist (returning "N/A"), and add a new key "category" with value "Beverages".

setup.py

product = {"name": "Chai", "price": 18.00, "stock": 39, "discontinued": False}

Given order (the nested dictionary from the chapter), write an expression to get the quantity of the second item in the order. Then write a loop that prints each item’s product name and total cost (quantity × unit_price).
Given two lists of customer countries from different regions:

setup.py

europe = {"France", "Germany", "UK", "Spain"}
americas = {"USA", "Brazil", "Canada", "France"}

Use set operations to find: countries that appear in both regions (there shouldn’t be any in a real dataset, but "France" is in both here), countries unique to Europe, and all countries combined.

Explain why {} creates an empty dictionary, not an empty set. How do you create each?

Solutions

solution.py

product["name"]                      # "Chai"
product.get("color", "N/A")          # "N/A"
product["category"] = "Beverages"    # Adds the key

solution.py

order["items"][1]["quantity"]  # 10

for item in order["items"]:
    total = item["quantity"] * item["unit_price"]
    print(f"{item['product']}: ${total:.2f}")

solution.py

europe & americas    # {"France"} (intersection)
europe - americas    # {"Germany", "UK", "Spain"} (difference)
europe | americas    # {"France", "Germany", "UK", "Spain", "USA", "Brazil", "Canada"} (union)

4. {} creates an empty dictionary because dictionaries were in Python before sets. Use set() to create an empty set and {} for an empty dictionary.

13.4 Sets

A set is an unordered collection of unique elements:

sets.py

categories = {"Beverages", "Condiments", "Seafood", "Beverages"}
print(categories)  # {"Beverages", "Condiments", "Seafood"} ← duplicate removed

Sets are useful for two things: removing duplicates and performing membership tests efficiently.

set_operations.py

a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7, 8}

a | b    # {1, 2, 3, 4, 5, 6, 7, 8}  Union
a & b    # {4, 5}                      Intersection
a - b    # {1, 2, 3}                   Difference (in a but not in b)
a ^ b    # {1, 2, 3, 6, 7, 8}         Symmetric difference (in one but not both)

If you remember set operations from a math or statistics course, these work exactly the same way. If you’ve used SQL’s UNION, INTERSECT, and EXCEPT, sets are the Python equivalent.

Creating an empty set

{} creates an empty dictionary, not an empty set. To create an empty set, use set():

empty_set.py

empty_dict = {}       # This is a dictionary
empty_set = set()     # This is a set

Summary

Python provides four built-in collection types, each serving a specific purpose. Lists store ordered, mutable sequences you access by position, making them ideal for processing data where order matters. Tuples offer immutable, fixed-length collections for grouping related values, and their unpacking feature lets you elegantly assign multiple values in a single statement. Dictionaries map keys to values, mirroring database records and providing efficient lookup by name rather than position. Sets store unique elements and support mathematical operations like union and intersection, useful for deduplication and membership testing.

These collections are the foundation for data processing. In the next chapter, you’ll learn how to make decisions about data and process collections systematically using control flow: if statements for conditionals and for/while loops for repetition. Together, collections and control flow form the core of Python programming.

Exercises

Tuple Unpacking and Structured Data. You have a tuple representing a product from the Northwind database: product = ("Chai", 18.00, 39, False) (name, price, units_in_stock, discontinued). Unpack this tuple into four variables and print a statement like: "Chai (18.00) - 39 units in stock - Active". Create three similar tuples for “Chang”, “Aniseed Syrup”, and “Chef Anton’s Cajun Seasoning” with realistic Northwind values, unpack each one, and print their summaries.
Dictionary Building with zip(). You have two lists: product names ["Chai", "Chang", "Aniseed Syrup"] and their prices [18.00, 19.00, 10.00]. Use zip() to combine them into a dictionary where each product name is a key and its price is the value. Then use this dictionary to look up the price of “Chang” and print all products with their prices in the format "Product: $price".
Set Operations with Northwind Data. You have two sets representing customer countries from different sales regions: region_a = {"France", "Germany", "UK", "Spain", "Italy"} and region_b = {"Germany", "Italy", "USA", "Brazil", "Canada"}. Use set operations to find: countries appearing in both regions (intersection), countries unique to Region A, countries unique to Region B, and the complete list of countries across both regions (union).
Northwind Inventory Report (Capstone). Create a comprehensive inventory report by combining lists, dictionaries, and sets. You’ll track three product categories, their products, and warehouse locations. Structure: a dictionary where keys are category names and values are lists of product names. Another dictionary for product-to-warehouse mappings (each product maps to a set of warehouse locations). Write code to: (a) print all products in the “Beverages” category, (b) find which warehouses stock “Chai”, (c) identify products that are stocked in multiple warehouses, and (d) count the total number of unique warehouse locations across all products.

Solutions

Tuple Unpacking and Structured Data

solution.py

# Chai
product = ("Chai", 18.00, 39, False)
name, price, units_in_stock, discontinued = product
status = "Active" if not discontinued else "Discontinued"
print(f"{name} ({price:.2f}) - {units_in_stock} units in stock - {status}")

# Chang
product = ("Chang", 19.00, 17, False)
name, price, units_in_stock, discontinued = product
status = "Active" if not discontinued else "Discontinued"
print(f"{name} ({price:.2f}) - {units_in_stock} units in stock - {status}")

# Aniseed Syrup
product = ("Aniseed Syrup", 10.00, 13, False)
name, price, units_in_stock, discontinued = product
status = "Active" if not discontinued else "Discontinued"
print(f"{name} ({price:.2f}) - {units_in_stock} units in stock - {status}")

# Chef Anton's Cajun Seasoning
product = ("Chef Anton's Cajun Seasoning", 22.00, 53, False)
name, price, units_in_stock, discontinued = product
status = "Active" if not discontinued else "Discontinued"
print(f"{name} ({price:.2f}) - {units_in_stock} units in stock - {status}")

Dictionary Building with zip()

solution.py

product_names = ["Chai", "Chang", "Aniseed Syrup"]
prices = [18.00, 19.00, 10.00]

# Build dictionary using zip()
product_prices = dict(zip(product_names, prices))

# Look up Chang's price
chang_price = product_prices["Chang"]
print(f"Chang costs ${chang_price:.2f}")

# Print all products with prices
for product, price in product_prices.items():
    print(f"{product}: ${price:.2f}")

Set Operations with Northwind Data

solution.py

region_a = {"France", "Germany", "UK", "Spain", "Italy"}
region_b = {"Germany", "Italy", "USA", "Brazil", "Canada"}

# Intersection: countries in both regions
both = region_a & region_b
print(f"In both regions: {both}")

# Unique to Region A
unique_a = region_a - region_b
print(f"Unique to Region A: {unique_a}")

# Unique to Region B
unique_b = region_b - region_a
print(f"Unique to Region B: {unique_b}")

# Union: all countries
all_countries = region_a | region_b
print(f"All countries: {all_countries}")

Northwind Inventory Report

solution.py

# Category to products mapping
categories = {
    "Beverages": ["Chai", "Chang", "Guarana Fantastica"],
    "Condiments": ["Chef Anton's Cajun Seasoning", "Louisiana Hot Spicy Sauce"],
    "Seafood": ["Carnarvon Tigers", "Escargots de Bourgogne"],
}

# Product to warehouses mapping (each product in a set of warehouses)
product_warehouses = {
    "Chai": {"New York", "London", "Tokyo"},
    "Chang": {"New York", "Sydney"},
    "Guarana Fantastica": {"Rio", "New York"},
    "Chef Anton's Cajun Seasoning": {"New Orleans"},
    "Louisiana Hot Spicy Sauce": {"New Orleans", "New York"},
    "Carnarvon Tigers": {"Sydney"},
    "Escargots de Bourgogne": {"London", "Paris"},
}

# (a) Print all products in Beverages category
print("Beverages category:")
for product in categories["Beverages"]:
    print(f"  - {product}")

# (b) Find warehouses stocking Chai
chai_warehouses = product_warehouses["Chai"]
print(f"\nWarehouses stocking Chai: {chai_warehouses}")

# (c) Identify products stocked in multiple warehouses
multi_warehouse = [product for product, warehouses in product_warehouses.items()
                   if len(warehouses) > 1]
print(f"\nProducts in multiple warehouses: {multi_warehouse}")

# (d) Count unique warehouse locations
all_warehouses = set()
for warehouses in product_warehouses.values():
    all_warehouses = all_warehouses | warehouses
print(f"\nTotal unique warehouses: {len(all_warehouses)} - {all_warehouses}")

Glossary

dictionary: A mutable collection of key-value pairs. Keys must be unique and hashable. Created with {} or dict().
immutable: An object that cannot be changed after creation. Strings, tuples, and frozensets are immutable.
iterable: Any object that can be looped over with a for statement. Lists, tuples, dictionaries, sets, strings, and ranges are all iterable.
key-value pair: A single entry in a dictionary, consisting of a key (used for lookup) and its associated value.
list: An ordered, mutable collection of values. Created with [] or list().
mutable: An object that can be changed after creation. Lists, dictionaries, and sets are mutable.
set: An unordered collection of unique elements. Supports mathematical set operations like union and intersection.
slice: A way to extract a portion of a sequence using start:stop:step notation.
tuple: An ordered, immutable collection of values. Created with () or tuple().
tuple unpacking: Assigning each element of a tuple (or other iterable) to a separate variable in one statement. Example: a, b, c = (1, 2, 3).
zero-based indexing: A numbering convention where the first element is at position 0, the second at position 1, and so on.