13 Collections
The previous chapter introduced Python’s scalar types: individual values like numbers, strings, and booleans. Real-world data, though, comes in groups. A database table is a collection of rows. A CSV file is a collection of records. An order has a collection of line items. To work with grouped data, you need collection types: ways of organizing multiple values together so you can process them as a unit.
This chapter covers Python’s four built-in collection types, each with a distinct purpose. You’ll learn how to create them, access their elements, and modify them. The next chapter will teach you how to make decisions about that data and process it systematically.
13.1 Lists
A list is an ordered, mutable collection of values. You create one with square brackets:
lists.py
products = ["Chai", "Chang", "Aniseed Syrup", "Chef Anton's Cajun Seasoning"]
prices = [18.00, 19.00, 10.00, 22.00]
mixed = [42, "hello", True, None, 3.14] # Lists can hold any type
empty = []Unlike arrays in many other languages, Python lists can hold values of different types. In practice, you’ll usually keep lists homogeneous (all the same type) because mixed-type lists are hard to reason about and harder to process.
13.1.1 Indexing
You access individual elements by their position, starting from zero:
indexing.py
products = ["Chai", "Chang", "Aniseed Syrup", "Tofu"]
products[0] # "Chai" (first element)
products[1] # "Chang" (second element)
products[3] # "Tofu" (fourth element)
products[-1] # "Tofu" (last element)
products[-2] # "Aniseed Syrup" (second to last)Python, like most programming languages, counts from zero. The first element is at index 0, the second is at index 1, and so on. If you’re coming from Matlab, where indexing starts at 1, this takes some adjustment. Off-by-one errors are one of the most common bugs in programming, and they often come from forgetting which index a language starts at.
A list with n elements has valid indices from 0 to n - 1.
Negative indices count backward from the end. -1 is the last element, -2 is the second to last, and so on. This is useful when you don’t know the length of a list but need to access elements near the end.
13.1.2 Slicing
A slice extracts a portion of a list. The syntax is list[start:stop], where start is included and stop is excluded:
slicing.py
products = ["Chai", "Chang", "Aniseed Syrup", "Tofu", "Miso"]
products[1:3] # ["Chang", "Aniseed Syrup"] (index 1 and 2, not 3)
products[:2] # ["Chai", "Chang"] (from the beginning)
products[2:] # ["Aniseed Syrup", "Tofu", "Miso"] (to the end)
products[:] # A copy of the entire list
products[::2] # ["Chai", "Aniseed Syrup", "Miso"] (every other element)The start:stop:step form lets you skip elements. products[::2] takes every second element. products[::-1] reverses the list entirely.
If you access an index that doesn’t exist, like products[99], Python raises an IndexError. But slices are forgiving: products[99:200] simply returns an empty list. This makes slicing safe for boundary cases.
13.1.3 Mutability
Lists are mutable, meaning you can change their contents after creation:
list_mutation.py
products = ["Chai", "Chang", "Tofu"]
# Change an element
products[0] = "Earl Grey" # ["Earl Grey", "Chang", "Tofu"]
# Add elements
products.append("Miso") # ["Earl Grey", "Chang", "Tofu", "Miso"]
products.insert(1, "Matcha") # ["Earl Grey", "Matcha", "Chang", "Tofu", "Miso"]
# Remove elements
products.pop() # Removes and returns "Miso"
products.remove("Chang") # Removes the first occurrence of "Chang"
# Extend with another list
products.extend(["Soy Sauce", "Wasabi"])The distinction between append() and extend() matters. append() adds a single element to the end. extend() adds every element from another iterable:
append_vs_extend.py
a = [1, 2, 3]
a.append([4, 5]) # [1, 2, 3, [4, 5]] ← a nested list!
b = [1, 2, 3]
b.extend([4, 5]) # [1, 2, 3, 4, 5] ← flat list13.1.4 Useful List Operations
A few more operations you’ll use regularly:
list_ops.py
numbers = [3, 1, 4, 1, 5, 9, 2, 6]
len(numbers) # 8 (number of elements)
sorted(numbers) # [1, 1, 2, 3, 4, 5, 6, 9] (returns a new list)
numbers.sort() # Sorts in place, returns None
numbers.reverse() # Reverses in place
min(numbers) # 1
max(numbers) # 9
sum(numbers) # 31
numbers.count(1) # 2 (how many times 1 appears)
numbers.index(5) # 4 (index of first occurrence of 5)Note the difference between sorted(numbers), which returns a new sorted list and leaves the original unchanged, and numbers.sort(), which sorts the list in place and returns None. This distinction between “returns a new thing” and “modifies in place” is a recurring theme in Python.
13.1.5 Exercises
Given
prices = [18.00, 19.00, 10.00, 22.00, 23.25, 6.50], write expressions to get: the first three prices, the last two prices, and every other price starting from the first. What doesprices[10]produce? What aboutprices[10:20]?Predict the output of this code without running it, then verify:
solution.py
items = [1, 2, 3]
items.append([4, 5])
print(len(items))
print(items[-1])Given
products = ["Chai", "Chang", "Tofu", "Miso", "Chai"], use list methods to: count how many times"Chai"appears, find the index of"Tofu", and remove the first occurrence of"Chai". What does the list look like after the removal?What is the difference between
sorted(numbers)andnumbers.sort()? Write code that demonstrates the difference by printing the return value of each.
1. prices[:3] → [18.0, 19.0, 10.0], prices[-2:] → [23.25, 6.50], prices[::2] → [18.0, 10.0, 23.25]. prices[10] raises IndexError. prices[10:20] returns [] (slices never raise IndexError).
2. len(items) → 4 (not 5, because append added the list [4, 5] as a single element). items[-1] → [4, 5] (a nested list).
3.
solution.py
products = ["Chai", "Chang", "Tofu", "Miso", "Chai"]
products.count("Chai") # 2
products.index("Tofu") # 2
products.remove("Chai")
# products is now ["Chang", "Tofu", "Miso", "Chai"]4. sorted(numbers) returns a NEW sorted list and leaves the original unchanged. numbers.sort() sorts in place and returns None.
solution.py
numbers = [3, 1, 4, 1, 5]
result_sorted = sorted(numbers)
print(result_sorted) # [1, 1, 3, 4, 5]
print(numbers) # [3, 1, 4, 1, 5] (unchanged)
result_sort = numbers.sort()
print(result_sort) # None
print(numbers) # [1, 1, 3, 4, 5] (modified in place)13.2 Tuples and Unpacking
A tuple looks like a list but uses parentheses instead of square brackets, and it’s immutable: once created, you can’t change its contents.
tuples.py
coordinates = (40.4406, -79.9959) # Pittsburgh, PA
product = ("Chai", 18.00, 39)
single = (42,) # Note the trailing comma for one-element tuplesTuples are useful when you have a fixed collection of related values that shouldn’t change, like a coordinate pair or a database record. You can read elements by index, just like lists:
tuple_access.py
coordinates[0] # 40.4406
coordinates[1] # -79.9959But you can’t modify them:
tuple_immutable.py
coordinates[0] = 41.0 # TypeError: 'tuple' object does not support item assignment13.2.1 Tuple Unpacking
The most powerful feature of tuples is unpacking: assigning each element to a separate variable in one statement.
unpacking.py
product = ("Chai", 18.00, 39)
name, price, stock = product # Unpacks into three variables
print(name) # "Chai"
print(price) # 18.0
print(stock) # 39If you don’t need every value, use _ as a throwaway placeholder:
throwaway.py
name, _, stock = product # We don't need the price right nowUnpacking works with any iterable, not just tuples. You’ll see it frequently with functions that return multiple values and with loops over dictionaries.
13.3 Dictionaries
A dictionary stores key-value pairs. Each key maps to a value, like a lookup table. You create one with curly braces:
dicts.py
product = {
"name": "Chai",
"category": "Beverages",
"unit_price": 18.00,
"units_in_stock": 39,
"discontinued": False,
}If you’ve used SQL, think of a dictionary as a single row from a table, where the keys are column names and the values are cell values.
13.3.1 Accessing Values
You access values by key, not by position:
dict_access.py
product["name"] # "Chai"
product["unit_price"] # 18.0
product["color"] # KeyError! Key doesn't existAccessing a key that doesn’t exist raises a KeyError. To avoid this, use the .get() method, which returns a default value instead:
dict_get.py
product.get("name") # "Chai"
product.get("color") # None (default when key is missing)
product.get("color", "Unknown") # "Unknown" (custom default)The .get() method is safer than bracket access when you aren’t certain a key exists. It’s especially useful when processing data that might have missing fields.
13.3.2 Modifying Dictionaries
Dictionaries are mutable. You can add, change, and remove key-value pairs:
dict_modify.py
product = {"name": "Chai", "price": 18.00}
# Add a new key
product["category"] = "Beverages"
# Change an existing key
product["price"] = 19.50
# Remove a key
del product["category"]
# Remove and return a value
price = product.pop("price") # 19.5
missing = product.pop("color", None) # None (no error if key is missing)13.3.3 Iterating Over Dictionaries
Dictionaries provide three views of their contents:
dict_iteration.py
product = {"name": "Chai", "price": 18.00, "stock": 39}
product.keys() # dict_keys(["name", "price", "stock"])
product.values() # dict_values(["Chai", 18.0, 39])
product.items() # dict_items([("name", "Chai"), ("price", 18.0), ("stock", 39)])The .items() method is especially useful because it gives you both keys and values, which you can unpack in a loop:
dict_loop.py
for key, value in product.items():
print(f"{key}: {value}")output
name: Chai
price: 18.0
stock: 39
13.3.4 Nested Dictionaries
Dictionaries can contain other dictionaries, creating hierarchical structures that mirror JSON data:
nested_dict.py
order = {
"order_id": 10248,
"customer": {
"id": "VINET",
"name": "Vins et alcools Chevalier",
"country": "France",
},
"items": [
{"product": "Quiche", "quantity": 12, "unit_price": 9.20},
{"product": "Mozzarella", "quantity": 10, "unit_price": 34.80},
],
}
order["customer"]["name"] # "Vins et alcools Chevalier"
order["items"][0]["quantity"] # 12This structure mirrors how you’ll see data in the real world. A database query might return rows; each row is a dictionary. Multiple rows form a list of dictionaries. A more complex query might return nested structures like the order example above.
13.3.5 Exercises
- Given the Northwind product dictionary below, write expressions to: get the product name, safely get a
"color"key that doesn’t exist (returning"N/A"), and add a new key"category"with value"Beverages".
solution.py
product = {"name": "Chai", "price": 18.00, "stock": 39, "discontinued": False}Given
order(the nested dictionary from the chapter), write an expression to get the quantity of the second item in the order. Then write a loop that prints each item’s product name and total cost (quantity × unit_price).Given two lists of customer countries from different regions:
setup.py
europe = {"France", "Germany", "UK", "Spain"}
americas = {"USA", "Brazil", "Canada", "France"}Use set operations to find: countries that appear in both regions (there shouldn’t be any in a real dataset, but "France" is in both here), countries unique to Europe, and all countries combined.
- Explain why
{}creates an empty dictionary, not an empty set. How do you create each?
1.
solution.py
product["name"] # "Chai"
product.get("color", "N/A") # "N/A"
product["category"] = "Beverages" # Adds the key2.
solution.py
order["items"][1]["quantity"] # 10
for item in order["items"]:
total = item["quantity"] * item["unit_price"]
print(f"{item['product']}: ${total:.2f}")3.
solution.py
europe & americas # {"France"} (intersection)
europe - americas # {"Germany", "UK", "Spain"} (difference)
europe | americas # {"France", "Germany", "UK", "Spain", "USA", "Brazil", "Canada"} (union)4. {} creates an empty dictionary because dictionaries were in Python before sets. Use set() to create an empty set and {} for an empty dictionary.
13.4 Sets
A set is an unordered collection of unique elements:
sets.py
categories = {"Beverages", "Condiments", "Seafood", "Beverages"}
print(categories) # {"Beverages", "Condiments", "Seafood"} ← duplicate removedSets are useful for two things: removing duplicates and performing membership tests efficiently.
set_operations.py
a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7, 8}
a | b # {1, 2, 3, 4, 5, 6, 7, 8} Union
a & b # {4, 5} Intersection
a - b # {1, 2, 3} Difference (in a but not in b)
a ^ b # {1, 2, 3, 6, 7, 8} Symmetric difference (in one but not both)If you remember set operations from a math or statistics course, these work exactly the same way. If you’ve used SQL’s UNION, INTERSECT, and EXCEPT, sets are the Python equivalent.
{} creates an empty dictionary, not an empty set. To create an empty set, use set():
empty_set.py
empty_dict = {} # This is a dictionary
empty_set = set() # This is a setSummary
Python provides four built-in collection types, each serving a specific purpose. Lists store ordered, mutable sequences you access by position, making them ideal for processing data where order matters. Tuples offer immutable, fixed-length collections for grouping related values, and their unpacking feature lets you elegantly assign multiple values in a single statement. Dictionaries map keys to values, mirroring database records and providing efficient lookup by name rather than position. Sets store unique elements and support mathematical operations like union and intersection, useful for deduplication and membership testing.
These collections are the foundation for data processing. In the next chapter, you’ll learn how to make decisions about data and process collections systematically using control flow: if statements for conditionals and for/while loops for repetition. Together, collections and control flow form the core of Python programming.
Glossary
- dictionary
-
A mutable collection of key-value pairs. Keys must be unique and hashable. Created with
{}ordict(). - immutable
- An object that cannot be changed after creation. Strings, tuples, and frozensets are immutable.
- iterable
-
Any object that can be looped over with a
forstatement. Lists, tuples, dictionaries, sets, strings, and ranges are all iterable. - key-value pair
- A single entry in a dictionary, consisting of a key (used for lookup) and its associated value.
- list
-
An ordered, mutable collection of values. Created with
[]orlist(). - mutable
- An object that can be changed after creation. Lists, dictionaries, and sets are mutable.
- set
- An unordered collection of unique elements. Supports mathematical set operations like union and intersection.
- slice
-
A way to extract a portion of a sequence using
start:stop:stepnotation. - tuple
-
An ordered, immutable collection of values. Created with
()ortuple(). - tuple unpacking
-
Assigning each element of a tuple (or other iterable) to a separate variable in one statement. Example:
a, b, c = (1, 2, 3). - zero-based indexing
- A numbering convention where the first element is at position 0, the second at position 1, and so on.