15  Functions & Modules

So far, you’ve been writing scripts that execute from top to bottom. That works for short, one-off tasks, but it doesn’t scale. When you find yourself copying the same block of code with minor changes, or when your script grows long enough that you lose track of what each section does, you need functions.

A function is a named, reusable block of code that takes inputs, does some work, and (optionally) returns a result. Functions are how you break a complex task into manageable pieces, name those pieces clearly, and compose them into larger workflows. By the end of this chapter, you’ll also learn to organize functions into modules, Python files that you can import and reuse across projects, and to document your functions with docstrings that make them usable by others (including your future self).

15.1 Defining Functions

You define a function with the def keyword, followed by a name, parameters in parentheses, and a colon. The function body is indented:

first_function.py
def calculate_revenue(unit_price, quantity):
    return unit_price * quantity

revenue = calculate_revenue(18.00, 10)
print(revenue)  # 180.0

The return statement sends a value back to the caller. Without an explicit return, a function returns None:

no_return.py
def greet(name):
    print(f"Hello, {name}!")

result = greet("Alice")
print(result)  # None

This distinction matters. Functions that return a value can be used in expressions, assigned to variables, and composed with other functions. Functions that only print produce output for humans but don’t create values that other code can use. In general, prefer returning values over printing them. Let the caller decide what to do with the result.

15.1.1 Functions as Units of Work

A good function does one thing, does it well, and has a name that makes its purpose obvious. Compare these two approaches:

bad_function.py
def process(data, mode):
    """Does too many things depending on the mode."""
    if mode == "clean":
        # 20 lines of cleaning logic
        ...
    elif mode == "analyze":
        # 30 lines of analysis logic
        ...
    elif mode == "report":
        # 25 lines of reporting logic
        ...
good_functions.py
def clean_order_data(raw_orders):
    """Remove invalid entries and standardize field names."""
    ...

def compute_revenue_by_category(orders):
    """Sum revenue grouped by product category."""
    ...

def format_revenue_report(category_totals):
    """Format category totals as a printable table."""
    ...

The second approach has three focused functions, each with a clear name and a single responsibility. They’re easier to test, easier to debug, and easier to reuse.

15.2 Docstrings

A docstring is the first string literal in a function (or module, or class). Python treats it specially: the help() function displays it, editors show it on hover, and documentation tools extract it automatically.

docstring_basic.py
def calculate_revenue(unit_price, quantity):
    """Calculate the total revenue for a line item."""
    return unit_price * quantity

In the REPL, help(calculate_revenue) will display:

output
Help on function calculate_revenue in module __main__:

calculate_revenue(unit_price, quantity)
    Calculate the total revenue for a line item.

15.2.1 The Google Docstring Convention

For functions with multiple parameters, return values, or potential errors, a one-line docstring isn’t enough. This book uses the Google convention, one of the most widely used formats:

google_docstring.py
def calculate_revenue(
    orders: list[dict],
    category: str | None = None,
) -> float:
    """Calculate total revenue, optionally filtered by category.

    Computes the sum of unit_price * quantity for each order.
    If a category is specified, only orders matching that category
    are included.

    Args:
        orders: List of order dictionaries. Each must have
            'category', 'unit_price', and 'quantity' keys.
        category: If provided, filter to this category only.
            If None, include all orders.

    Returns:
        Total revenue as a float.

    Raises:
        ValueError: If the orders list is empty.
    """
    if not orders:
        raise ValueError("Orders list cannot be empty.")

    if category is not None:
        orders = [o for o in orders if o["category"] == category]

    return sum(o["unit_price"] * o["quantity"] for o in orders)

The Google convention has four sections, each optional:

Args describes each parameter: its name, type (if not already annotated), and what it represents. If a parameter has a default value, document what the default means.

Returns describes what the function gives back. For functions that return None, you can omit this section.

Raises lists the exceptions the function might raise and under what conditions.

The summary line (first line of the docstring) should be a concise, imperative statement of what the function does: “Calculate total revenue,” not “This function calculates total revenue.”

TipWrite the docstring first

Try writing the docstring before you write the function body. Describing what a function should do, what it takes as input, and what it returns forces you to think about the interface before you worry about the implementation. If the docstring is hard to write, the function might be doing too much.

15.2.2 Module Docstrings

A module docstring is the first string in a .py file. It describes what the file contains and why:

northwind_utils.py
"""Utility functions for Northwind order processing.

This module provides functions for reading, filtering, and
summarizing Northwind product and order data from CSV files.
"""

def calculate_revenue(unit_price, quantity):
    ...

When someone runs help(northwind_utils) after importing your module, they’ll see this docstring along with a list of the module’s functions.

15.3 Arguments in Depth

Python functions support several argument-passing patterns that give you flexibility in how you design interfaces.

15.3.1 Positional and Keyword Arguments

When you call a function, you can pass arguments by position or by name:

argument_styles.py
def create_product(name, price, category):
    return {"name": name, "price": price, "category": category}

# Positional: arguments matched by order
create_product("Chai", 18.00, "Beverages")

# Keyword: arguments matched by name (order doesn't matter)
create_product(category="Beverages", name="Chai", price=18.00)

# Mixed: positional first, then keyword
create_product("Chai", price=18.00, category="Beverages")

Keyword arguments make function calls more readable, especially when a function has many parameters or when the meaning of a positional argument isn’t obvious from context.

15.3.2 Default Values

Parameters can have default values, making them optional:

defaults.py
def format_price(amount, currency="USD", decimals=2):
    symbols = {"USD": "$", "EUR": "€", "GBP": "£"}
    symbol = symbols.get(currency, currency)
    return f"{symbol}{amount:,.{decimals}f}"

format_price(1234.5)                     # "$1,234.50"
format_price(1234.5, currency="EUR")     # "€1,234.50"
format_price(1234.5, decimals=0)         # "$1,234"
WarningThe mutable default argument trap

Never use a mutable value (like a list or dictionary) as a default argument:

mutable_default_bad.py
def add_item(item, items=[]):  # BUG: the list is shared across calls!
    items.append(item)
    return items

add_item("Chai")   # ["Chai"]
add_item("Chang")  # ["Chai", "Chang"] ← Where did "Chai" come from?

Default values are evaluated once when the function is defined, not each time it’s called. So every call shares the same list object. The fix is to use None as the default and create a new list inside the function:

mutable_default_fixed.py
def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

15.3.3 *args and **kwargs

When you don’t know how many arguments a function will receive, use *args for positional arguments and **kwargs for keyword arguments:

args_kwargs.py
def summarize(*values):
    """Summarize any number of numeric values."""
    return {
        "count": len(values),
        "sum": sum(values),
        "mean": sum(values) / len(values) if values else 0,
    }

summarize(1, 2, 3)        # {"count": 3, "sum": 6, "mean": 2.0}
summarize(10, 20, 30, 40) # {"count": 4, "sum": 100, "mean": 25.0}

*args collects extra positional arguments into a tuple. **kwargs collects extra keyword arguments into a dictionary. You’ll encounter these most often when reading library code rather than writing your own, but knowing how they work helps you understand function signatures in documentation.

15.3.4 Keyword-Only Arguments

Placing a bare * in the parameter list forces everything after it to be keyword-only:

keyword_only.py
def query_products(category, *, min_price=0, in_stock_only=False):
    ...

# These work:
query_products("Beverages", min_price=10, in_stock_only=True)

# This fails:
query_products("Beverages", 10, True)  # TypeError

Keyword-only arguments prevent positional confusion. When a function has several options, requiring them to be named makes the call site much more readable.

15.3.5 Exercises

  1. Write a function compute_line_total(unit_price, quantity, discount=0.0) that returns the line total after discount. Include a Google-style docstring. Test it with compute_line_total(18.00, 10) and compute_line_total(18.00, 10, discount=0.15).

  2. Predict the output of this code without running it:

solution.py
def add_item(item, items=[]):
    items.append(item)
    return items

print(add_item("Chai"))
print(add_item("Chang"))

Why does the second call return ["Chai", "Chang"] instead of ["Chang"]? Write the corrected version.

  1. Write a function format_product(name, price, *, currency="$", decimals=2) where currency and decimals are keyword-only arguments. Call it three different ways: with defaults, with a custom currency, and with zero decimal places.

  2. What happens if you call create_product("Chai", 18.00, "Beverages") vs. create_product(price=18.00, name="Chai", category="Beverages") for a function with positional parameters (name, price, category)? Are the results the same?

1. Show the code:

solution.py
def compute_line_total(unit_price: float, quantity: int, discount: float = 0.0) -> float:
    """Compute the total for an order line item after discount.

    Args:
        unit_price: Price per unit in dollars.
        quantity: Number of units ordered.
        discount: Discount as a decimal (0.15 = 15%). Defaults to 0.0.

    Returns:
        Total price after discount.
    """
    return unit_price * quantity * (1 - discount)

compute_line_total(18.00, 10)                # 180.0
compute_line_total(18.00, 10, discount=0.15) # 153.0

2. The second call returns ["Chai", "Chang"] because the default list [] is created once at function definition time and shared across all calls. Fix:

solution.py
def add_item(item, items=None):
    if items is None:
        items = []
    items.append(item)
    return items

3. Show the code:

solution.py
format_product("Chai", 18.00)                           # "$18.00"
format_product("Chai", 18.00, currency="€")             # "€18.00"
format_product("Chai", 18.00, currency="$", decimals=0) # "$18"

(The function body would be return f"{currency}{price:,.{decimals}f}")

4. Yes, both produce the same result. Positional arguments are matched by order, keyword arguments by name. The second form is more readable when the meaning of arguments isn’t obvious from context.

15.4 Scope and Namespaces

When you create a variable inside a function, it exists only within that function. This is called local scope:

scope.py
def compute_tax(amount):
    rate = 0.06   # Local variable
    return amount * rate

compute_tax(100)  # 6.0
print(rate)       # NameError: name 'rate' is not defined

The variable rate is created inside compute_tax and destroyed when the function returns. Code outside the function can’t see it. This is a feature, not a limitation: it means functions don’t accidentally interfere with each other’s variables.

15.4.1 The LEGB Rule

When Python encounters a name, it looks for it in four scopes, in this order:

  1. Local: variables defined in the current function.
  2. Enclosing: variables in any enclosing function (relevant for nested functions).
  3. Global: variables defined at the module level (the top of your .py file).
  4. Built-in: Python’s built-in names like print, len, range.

Python uses the first match it finds. If the name doesn’t exist in any scope, you get a NameError.

legb.py
tax_rate = 0.06  # Global scope

def compute_total(subtotal):
    # 'subtotal' is in Local scope
    # 'tax_rate' is found in Global scope
    return subtotal * (1 + tax_rate)

print(compute_total(100))  # 106.0
WarningAvoid global

Python has a global keyword that lets a function modify a global variable. Don’t use it. It makes your code harder to reason about because any function could change any variable at any time. Instead, pass values into functions as arguments and get values out via return.

15.4.2 Variable Shadowing

A local variable can shadow a global one with the same name:

shadowing.py
name = "Global Name"

def greet():
    name = "Local Name"  # This creates a NEW local variable
    print(name)          # "Local Name"

greet()
print(name)  # "Global Name" ← unchanged

The function’s name is a completely separate variable from the global name. Shadowing is a common source of subtle bugs, especially when you accidentally reuse a name like list, sum, or input that shadows a built-in:

shadowing_builtin.py
# Don't do this!
list = [1, 2, 3]          # Shadows the built-in list() function
new_list = list(range(5)) # TypeError: 'list' object is not callable

15.4.3 Exercises

  1. Predict the output of this code:
solution.py
x = 10

def outer():
    x = 20
    def inner():
        print(x)
    inner()

outer()
print(x)
  1. Why does this code raise an error?
solution.py
count = 0
def increment():
    count = count + 1
    return count
increment()

How would you fix it without using the global keyword? (Hint, pass count as an argument and return the new value.)

  1. A colleague writes list = [1, 2, 3] at the top of their script. Later, list(range(5)) fails with a TypeError. Explain what went wrong and how to fix it.

1. The inner() function prints 20 (from the enclosing scope). The final print(x) prints 10 (from the global scope). The x = 20 in outer() creates a local variable that shadows the global one.

2. Python sees count = count + 1 and treats count as a local variable (because it’s being assigned to). But count + 1 tries to read the local count before it’s been assigned, causing UnboundLocalError. Fix:

solution.py
def increment(count):
    return count + 1

count = 0
count = increment(count)  # count is now 1

3. Assigning list = [1, 2, 3] shadows the built-in list() function. Now list refers to the [1, 2, 3] object, not the constructor. Fix by choosing a different variable name like numbers = [1, 2, 3] and deleting (or not using) the shadowed name.

15.5 Functions as Objects

In Python, functions aren’t special. They’re objects, just like integers, strings, and lists. You can assign them to variables, pass them as arguments, and store them in collections:

function_objects.py
def double(x):
    return x * 2

def triple(x):
    return x * 3

# Assign a function to a variable
transform = double
print(transform(5))  # 10

# Put functions in a list
operations = [double, triple]
for op in operations:
    print(op(10))  # 20, then 30

15.5.1 Passing Functions as Arguments

The most practical use of functions-as-objects is passing one function to another. You’ve already seen this with sorted():

sorted_key.py
products = [
    {"name": "Tofu", "price": 23.25},
    {"name": "Chai", "price": 18.00},
    {"name": "Chang", "price": 19.00},
]

# Sort by price using a key function
by_price = sorted(products, key=lambda p: p["price"])

The key argument tells sorted() how to extract a comparison value from each element. The lambda creates a small anonymous function inline.

15.5.2 Lambda Expressions

A lambda is a one-expression function without a name:

lambda.py
square = lambda x: x ** 2
square(5)  # 25

# Equivalent to:
def square(x):
    return x ** 2

Lambdas are useful as arguments to functions like sorted(), map(), and filter(). They’re not useful for anything complex, because they’re limited to a single expression.

TipComprehensions vs. map() and filter()

Python provides map() and filter() for transforming and filtering sequences:

map_filter.py
prices = [18.00, 19.00, 10.00, 23.25]

# map() applies a function to each element
with_tax = list(map(lambda p: p * 1.06, prices))

# filter() keeps elements where the function returns True
expensive = list(filter(lambda p: p > 15, prices))

Comprehensions are almost always clearer:

comprehension_equivalent.py
with_tax = [p * 1.06 for p in prices]
expensive = [p for p in prices if p > 15]

Prefer comprehensions. Use map() and filter() only when you have an existing named function to pass and the comprehension would add no clarity.

15.6 Generators

What happens when you need to process a million records, but you don’t have enough memory to hold them all in a list at once? Generators solve this problem by producing values one at a time, on demand, instead of computing everything upfront.

15.6.1 The Memory Problem

Consider reading a large file and processing each line:

memory_problem.py
# This loads the ENTIRE file into memory as a list
lines = open("huge_file.csv").readlines()

# If the file is 10 GB, you just used 10 GB of RAM
for line in lines:
    process(line)

If the file is large, this approach exhausts your memory. A generator processes one line at a time, never holding the whole file in memory.

15.6.2 Generator Functions with yield

A generator function uses yield instead of return. Each time it yields a value, it pauses and gives the value to the caller. When the caller asks for the next value, the function resumes exactly where it left off:

generator.py
def count_up_to(n):
    """Generate integers from 1 to n."""
    i = 1
    while i <= n:
        yield i
        i += 1

# The function doesn't execute yet. It returns a generator object.
counter = count_up_to(5)

# Each call to next() resumes the function and gets the next value
print(next(counter))  # 1
print(next(counter))  # 2
print(next(counter))  # 3

# Or use it in a for loop (the normal way)
for num in count_up_to(5):
    print(num)

The key insight is that yield pauses the function, and next() (or a for loop) resumes it. The function maintains its state between yields: all its local variables are preserved.

15.6.3 Generator Expressions

Just as list comprehensions create lists, generator expressions create generators. The only difference is parentheses instead of brackets:

genexpr.py
# List comprehension: creates the entire list in memory
squares_list = [n ** 2 for n in range(1_000_000)]

# Generator expression: creates values one at a time
squares_gen = (n ** 2 for n in range(1_000_000))

The list uses memory proportional to a million elements. The generator uses almost no memory, because it computes each square only when asked.

Generator expressions work anywhere an iterable is expected:

genexpr_sum.py
# Sum of squares without building a list
total = sum(n ** 2 for n in range(1_000_000))

15.6.4 A Practical Generator

Here’s a generator that reads a CSV file line by line, yielding each row as a dictionary. It processes the file lazily, never loading more than one row into memory:

csv_generator.py
def read_csv_rows(filepath):
    """Yield each row of a CSV file as a dictionary.

    Args:
        filepath: Path to the CSV file.

    Yields:
        A dictionary mapping column names to values for each row.
    """
    with open(filepath) as f:
        headers = f.readline().strip().split(",")
        for line in f:
            values = line.strip().split(",")
            yield dict(zip(headers, values))

# Process order details without loading the entire file
for row in read_csv_rows("order_details.csv"):
    if float(row["unitPrice"]) > 20:
        print(row["productID"])

You’ll revisit file processing in much more detail in Chapter 16. For now, the important concept is that generators let you work with data that’s larger than your available memory, by producing values one at a time.

15.6.5 Exercises

  1. Write a generator expression that yields the cube of each number from 1 to 100. Use sum() to compute the total without creating a list. Then write the equivalent list comprehension and compare what type() returns for each.

  2. Predict what this code prints:

example.py
def countdown(n):
    while n > 0:
        yield n
        n -= 1

gen = countdown(3)
print(next(gen))
print(next(gen))
print(list(gen))
  1. Explain why sum(n ** 2 for n in range(1_000_000)) uses almost no memory, while sum([n ** 2 for n in range(1_000_000)]) uses significant memory. Which should you prefer?

1. Show the code:

solution.py
total = sum(n ** 3 for n in range(1, 101))   # Generator expression
total_list = sum([n ** 3 for n in range(1, 101)])  # List comprehension

type(n ** 3 for n in range(1, 101))   # <class 'generator'>
type([n ** 3 for n in range(1, 101)]) # <class 'list'>

Both produce the same sum (25502500), but the generator never creates the full list in memory.

2. next(gen)3, next(gen)2, list(gen)[1]. The generator remembers its state between next() calls. By the time list() is called, it has already yielded 3 and 2, so only 1 remains.

3. The generator expression (n ** 2 for ...) computes values one at a time and feeds them to sum(), never storing more than one value in memory. The list comprehension [n ** 2 for ...] creates a list of 1 million integers in memory first, then sums it. Prefer the generator expression when you only need to iterate once.

15.7 Modules and Imports

A module is a .py file that contains functions, variables, and other code that you can import and reuse. Every Python file is a module. When you write import math, you’re importing Python’s built-in math module.

15.7.1 Import Syntax

Python provides several ways to import:

imports.py
# Import the entire module
import math
print(math.sqrt(16))    # 4.0

# Import specific items
from math import sqrt, pi
print(sqrt(16))          # 4.0
print(pi)                # 3.141592653589793

# Import with an alias
import statistics as stats
print(stats.mean([1, 2, 3]))  # 2.0

The import module form is usually preferred because it makes clear where each function comes from. When you see math.sqrt(), you know it’s from the math module. When you see just sqrt(), you have to remember which import brought it in.

15.7.2 What Happens When You Import

When Python encounters import mymodule, it:

  1. Finds the file mymodule.py (or looks in the standard library and installed packages).
  2. Executes the entire file from top to bottom.
  3. Creates a module object containing everything the file defined.
  4. Binds that object to the name mymodule in your current scope.

The second step is important: all code at the module level runs when you import. This means if your module has a print("Hello!") at the top level, it prints every time someone imports it.

15.7.3 The if __name__ == "__main__" Pattern

To prevent top-level code from running on import, wrap it in this guard:

my_module.py
"""Utility functions for temperature conversion."""

def fahrenheit_to_celsius(f):
    """Convert Fahrenheit to Celsius."""
    return (f - 32) * 5 / 9

def celsius_to_fahrenheit(c):
    """Convert Celsius to Fahrenheit."""
    return c * 9 / 5 + 32

# This only runs when the file is executed directly,
# not when it's imported by another file.
if __name__ == "__main__":
    temp_f = 72
    temp_c = fahrenheit_to_celsius(temp_f)
    print(f"{temp_f}°F = {temp_c:.1f}°C")

When you run uv run my_module.py, Python sets __name__ to "__main__", so the code under the guard executes. When another file runs import my_module, __name__ is set to "my_module", and the guard block is skipped. This pattern lets a file serve double duty: as both an importable module and a standalone script.

15.7.4 Building Your Own Modules

Organizing code into modules is one of the most important habits in professional Python development. Consider this structure:

output
northwind-analysis/
├── pyproject.toml
├── northwind_utils.py     # Your reusable functions
└── analyze_revenue.py     # A script that uses them
northwind_utils.py
"""Utility functions for Northwind order processing.

This module provides functions for filtering, aggregating, and
formatting Northwind product and order data.
"""

def filter_by_category(orders, category):
    """Filter a list of orders to a specific category.

    Args:
        orders: List of order dictionaries with a 'category' key.
        category: The category name to filter by.

    Returns:
        A new list containing only orders in the specified category.
    """
    return [o for o in orders if o["category"] == category]


def compute_revenue(orders):
    """Compute total revenue from a list of orders.

    Each order must have 'unit_price' and 'quantity' keys.

    Args:
        orders: List of order dictionaries.

    Returns:
        Total revenue as a float.
    """
    return sum(o["unit_price"] * o["quantity"] for o in orders)


def format_currency(amount, symbol="$"):
    """Format a number as currency.

    Args:
        amount: The numeric value to format.
        symbol: The currency symbol to prepend. Defaults to '$'.

    Returns:
        A formatted string like '$1,234.56'.
    """
    return f"{symbol}{amount:,.2f}"
analyze_revenue.py
"""Analyze Northwind revenue by category."""

from northwind_utils import compute_revenue, filter_by_category, format_currency

# Sample data (in the next chapter, we'll read this from files)
orders = [
    {"product": "Chai", "category": "Beverages", "unit_price": 18.00, "quantity": 10},
    {"product": "Chang", "category": "Beverages", "unit_price": 19.00, "quantity": 5},
    {"product": "Tofu", "category": "Produce", "unit_price": 23.25, "quantity": 15},
]

beverages = filter_by_category(orders, "Beverages")
revenue = compute_revenue(beverages)
print(f"Beverages revenue: {format_currency(revenue)}")

The functions in northwind_utils.py are reusable. You can import them in any script, a different analysis, a report generator, or a test suite. By giving each function a clear name, a docstring, and a focused purpose, you’ve created a building block that others can use without reading the implementation.

15.7.5 Standard Library Highlights

Python ships with a large standard library. Here are a few modules you’ll use frequently:

Table 15.1: Common standard library modules
Module Purpose Example
math Mathematical functions math.sqrt(16), math.log(100)
random Random number generation random.choice([1, 2, 3])
statistics Basic statistics statistics.mean([1, 2, 3])
datetime Date and time handling datetime.date.today()
pathlib File system paths Path("data/orders.csv")
csv CSV file reading/writing csv.DictReader(file)
json JSON reading/writing json.load(file)

You’ll use pathlib, csv, and json extensively in the next chapter.

15.8 Putting It Together: A Northwind Utility Module

Let’s build a more complete version of the utility module, bringing together functions, docstrings, comprehensions, and generators:

northwind_utils.py
"""Utility functions for Northwind order processing.

Provides functions for filtering, aggregating, and formatting
Northwind order data. Designed to be imported by analysis scripts.
"""


def filter_active_products(products):
    """Return only products that are not discontinued.

    Args:
        products: List of product dictionaries with a
            'discontinued' key.

    Returns:
        A new list containing only active (non-discontinued) products.
    """
    return [p for p in products if not p["discontinued"]]


def group_by_category(orders):
    """Group orders by their product category.

    Args:
        orders: List of order dictionaries with a 'category' key.

    Returns:
        A dictionary mapping category names to lists of orders.
    """
    groups = {}
    for order in orders:
        category = order["category"]
        if category not in groups:
            groups[category] = []
        groups[category].append(order)
    return groups


def compute_revenue(orders):
    """Compute total revenue from a list of orders.

    Args:
        orders: List of order dictionaries, each with
            'unit_price' and 'quantity' keys.

    Returns:
        Total revenue as a float.
    """
    return sum(o["unit_price"] * o["quantity"] for o in orders)


def revenue_by_category(orders):
    """Compute revenue for each product category.

    Args:
        orders: List of order dictionaries with 'category',
            'unit_price', and 'quantity' keys.

    Returns:
        A dictionary mapping category names to total revenue.
    """
    groups = group_by_category(orders)
    return {
        category: compute_revenue(category_orders)
        for category, category_orders in groups.items()
    }


def format_report(category_totals):
    """Format category revenue totals as a printable report.

    Args:
        category_totals: Dictionary mapping category names
            to revenue amounts.

    Returns:
        A formatted string with aligned columns.
    """
    sorted_items = sorted(
        category_totals.items(),
        key=lambda pair: pair[1],
        reverse=True,
    )

    lines = [f"{'Category':<20} {'Revenue':>12}"]
    lines.append(f"{'-' * 20} {'-' * 12}")
    for category, revenue in sorted_items:
        lines.append(f"{category:<20} ${revenue:>11,.2f}")

    total = sum(category_totals.values())
    lines.append(f"{'-' * 20} {'-' * 12}")
    lines.append(f"{'TOTAL':<20} ${total:>11,.2f}")

    return "\n".join(lines)


if __name__ == "__main__":
    sample_orders = [
        {"product": "Chai", "category": "Beverages", "unit_price": 18.00, "quantity": 10},
        {"product": "Chang", "category": "Beverages", "unit_price": 19.00, "quantity": 5},
        {"product": "Aniseed Syrup", "category": "Condiments", "unit_price": 10.00, "quantity": 20},
        {"product": "Tofu", "category": "Produce", "unit_price": 23.25, "quantity": 15},
    ]

    totals = revenue_by_category(sample_orders)
    print(format_report(totals))

Running this directly (uv run northwind_utils.py) prints the report. Importing it (from northwind_utils import revenue_by_category) gives you access to the individual functions without running the demo.

Exercises

Function Design

  1. Write a function classify_stock_level that takes units_in_stock and reorder_level as arguments and returns a stock classification string. Use these thresholds: "Critical" if stock is zero, "Low" if stock is at or below the reorder level, "Adequate" if stock is above the reorder level but below three times the reorder level, and "Overstocked" if stock is at or above three times the reorder level. Include a Google-style docstring.

  2. Write a function format_product_line that takes a product name, price, and quantity and returns a formatted string like "Chai ............. $18.00 x 10". Use f-string alignment to make the output neat.

  3. Refactor the order processing script from Chapter 13 into functions. Each logical step (filtering, grouping, computing totals, formatting output) should be its own function.

Generator Practice

  1. Write a generator function fibonacci() that yields Fibonacci numbers indefinitely. Use it with a for loop and break to print the first 20 terms.

  2. Write a generator expression that yields the square root of each number from 1 to 1,000,000. Use sum() to add them up without creating a list.

  3. Compare memory usage: create a list comprehension [n ** 2 for n in range(10_000_000)] and a generator expression (n ** 2 for n in range(10_000_000)). Which one can you create without running out of memory? (Hint: use import sys; sys.getsizeof() on each.)

Module Building

  1. Create a file called conversions.py that contains functions for common engineering unit conversions (PSI to kPa, Fahrenheit to Celsius, miles to kilometers, etc.). Include a module docstring and Google-style docstrings for each function. Write a separate script that imports and uses these functions.

  2. Add to your northwind_utils.py: a function that takes a list of products and returns the top N products by stock value (price × quantity). Include a default of n=5.

Summary

Functions are the primary organizational tool in Python. A well-designed function has a clear name, a focused purpose, documented parameters and return values, and no side effects beyond what its docstring describes. Docstrings, particularly the Google convention, make functions self-documenting: anyone can call help() on your function and understand how to use it.

Generators extend the idea of functions with yield, producing values lazily instead of computing them all at once. They’re essential for processing data that doesn’t fit in memory, a common situation in real-world engineering data work.

Modules turn .py files into importable, reusable libraries. The if __name__ == "__main__" pattern lets a file work both as a standalone script and as an importable module. These organizational techniques, functions, generators, and modules, are the foundation you’ll build on for every remaining chapter.

Glossary

argument
A value passed to a function when it’s called. Distinguished from a parameter, which is the variable name in the function definition.
closure
A function that captures variables from its enclosing scope. Created when a nested function references a variable from the outer function.
default value
A value assigned to a parameter in the function definition, used when the caller doesn’t provide that argument.
docstring
The first string literal in a function, class, or module. Used by help(), editors, and documentation tools to describe the item’s purpose and usage.
generator
A function that uses yield instead of return, producing values one at a time on demand. Generator expressions use parentheses: (expr for x in iterable).
keyword argument
An argument passed by name: f(x=10). Can be in any order.
lambda
A small anonymous function defined with the lambda keyword. Limited to a single expression.
LEGB rule
The order Python searches for names: Local, Enclosing, Global, Built-in.
module
A .py file that can be imported by other Python code. Contains functions, variables, classes, and other definitions.
namespace
A mapping from names to objects. Each scope (local, global, built-in) has its own namespace.
parameter
A variable name in a function definition that receives a value when the function is called.
positional argument
An argument matched to a parameter by position: the first argument goes to the first parameter, and so on.
scope
The region of code where a variable is accessible. Python has local, enclosing, global, and built-in scopes.
shadowing
When a local variable has the same name as a variable in an outer scope, hiding it within the local scope.
yield
A keyword that pauses a generator function and produces a value. The function resumes when the next value is requested.