22 Python Projects
As you build your data product, your code will grow beyond simple scripts. You’ll create modules, organize your analysis into reusable functions, manage dependencies, and eventually share your work with others. Project engineering is the discipline of organizing this growth.
In Chapter 12, you ran uv init to create your first Python project. That gave you a pyproject.toml, a .python-version file, a starter script, and just enough structure to start writing code. For the work you’ve done so far, that was sufficient. But when you push your Northwind analysis to GitHub and a colleague clones it, things can go wrong quickly. They run your script and get errors because they don’t have duckdb or polars installed. They install those packages, but they get a different version of Polars than you used, and the API has changed. Your code breaks on their machine even though it works perfectly on yours.
This is the problem that project engineering solves. A properly structured project declares its dependencies, pins their exact versions, configures its tools, documents its purpose, and can be reproduced by anyone with a single command. The tool that manages all of this is uv, which you’ve been using throughout the book. This chapter goes deep on everything uv can do, so your projects are professional and reproducible from the first commit.
22.1 From Basic to Professional
When you ran uv init my-project in Chapter 12, uv created this structure:
output
my-project/
├── pyproject.toml
├── .python-version
├── README.md
├── .gitignore
└── hello.py
That’s a simple project: a flat directory with a configuration file and some scripts. It’s fine for homework assignments, quick experiments, and standalone scripts. But when your project grows to include modules, notebooks, data files, and multiple dependencies, you need more structure.
uv supports three project initialization modes, each creating a different structure for a different purpose.
22.2 The Three uv init Modes
22.2.1 Simple Project: uv init
terminal
uv init my-scriptsoutput
my-scripts/
├── pyproject.toml
├── .python-version
├── README.md
├── .gitignore
└── hello.py
This is what you’ve been using. Scripts live at the top level. There’s no src/ directory and no package structure. You run scripts with uv run hello.py. This mode is ideal for one-off scripts, homework, and quick experiments.
22.2.2 Application Package: uv init --package
terminal
uv init --package northwind-analysisoutput
northwind-analysis/
├── pyproject.toml
├── .python-version
├── README.md
├── .gitignore
└── src/
└── northwind_analysis/
└── __init__.py
The --package flag creates a proper Python package with a src/ layout. The code inside src/northwind_analysis/ is importable as a module, and the project can define CLI entry points (you’ll use this in Chapter 26). This mode is for tools and applications you want to distribute.
Notice that the project name uses hyphens (northwind-analysis) but the package directory uses underscores (northwind_analysis). This is a Python convention: hyphens are standard in project names (what you see on PyPI), and underscores are required in package names (what you use in import statements). uv handles this mapping automatically.
22.2.3 Library: uv init --library
terminal
uv init --library northwind-utilsoutput
northwind-utils/
├── pyproject.toml
├── .python-version
├── README.md
├── .gitignore
├── src/
│ └── northwind_utils/
│ ├── __init__.py
│ └── py.typed
└── tests/
└── __init__.py
The --library flag is similar to --package but configured for code that others will install as a dependency. It includes a tests/ directory and a py.typed marker file that tells type checkers the library ships with type information. This mode is for reusable code that other projects will uv add as a dependency.
22.2.4 When to Use Each
uv init mode
| Mode | Use When | Example |
|---|---|---|
uv init |
Quick scripts, homework, experiments | A one-off data cleaning script |
uv init --package |
Tools with CLI entry points, distributable apps | Your Northwind reporting tool |
uv init --library |
Reusable code others will import | A shared utilities package |
For the Northwind project in this book, --package is the right choice. You’re building an application with a CLI interface (Chapter 26), not a library for others to import.
22.3 pyproject.toml in Depth
The pyproject.toml file is the single source of truth for your project. It declares metadata, dependencies, tool configuration, and build instructions. Every tool in the Python ecosystem reads it. Here’s a comprehensive example for the Northwind project:
pyproject.toml
[project]
name = "northwind-analysis"
version = "0.1.0"
description = "Analytical workflows for the Northwind trading database."
readme = "README.md"
license = "MIT"
requires-python = ">=3.13"
authors = [
{ name = "Your Name", email = "you@example.com" },
]
[project.dependencies]
duckdb = ">=1.0"
polars = ">=1.0"
altair = ">=5.0"
xlsxwriter = ">=3.2"
typer = ">=0.12"
[dependency-groups]
dev = [
"basedpyright>=1.20",
"marimo>=0.9",
"ruff>=0.8",
]
[project.scripts]
northwind = "northwind_analysis.cli:app"
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"Let’s walk through each section.
22.3.1 [project] Metadata
The [project] section contains human-readable information about your project. The name is what appears on PyPI if you publish it. The version follows semantic versioning (MAJOR.MINOR.PATCH). The requires-python field ensures that anyone installing your project has a compatible Python version.
22.3.2 [project.dependencies]
These are your runtime dependencies, the packages your code needs to execute. When someone runs uv sync on your project, uv installs exactly these packages (and their transitive dependencies).
Version specifiers control which versions are acceptable:
| Specifier | Meaning | Example |
|---|---|---|
>=1.0 |
Any version 1.0 or higher | polars = ">=1.0" |
~=1.2 |
Compatible release (≥1.2, <2.0) | duckdb = "~=1.2" |
==1.2.3 |
Exactly this version | xlsxwriter = "==3.2.0" |
For most projects, >= is the right choice. It allows updates while preventing downgrades below the version you’ve tested with. The uv.lock file (discussed below) pins the exact versions that were actually installed, so >= in pyproject.toml expresses your intent while uv.lock records the reality.
22.3.3 [dependency-groups]
Development dependencies are tools you need during development but that your users don’t need. Ruff and basedpyright are development tools: they help you write and check code, but they aren’t imported by your application at runtime. Separating them into a dev group keeps your production dependencies lean.
terminal
uv add --dev ruff basedpyrightThis adds them to the [dependency-groups] section. When a user installs your project, they get only the runtime dependencies. When a developer runs uv sync, they get everything.
Note that Marimo is a regular dependency (uv add marimo), not a dev dependency, because your notebooks may import your project’s modules and your project’s users may want to run them.
You might be wondering: if Zed has been running Ruff and basedpyright throughout the book, why add them as dependencies now? The answer is that the same tool can live in different places, and each place serves a different purpose.
Zed’s built-in version runs as a language server inside the editor. It formats your code on save and shows diagnostics inline. This is what you’ve been using throughout the book. But this version only talks to Zed. It can’t be run from the terminal, which means you can’t pipe its output to other tools, include it in a CI pipeline, or hand it to an AI coding agent.
The project dependency (what uv add --dev installs) puts the tool in your project’s virtual environment. Now you can run uv run ruff check or uv run basedpyright from the command line. The output goes to your terminal, where you can read it, copy it into a prompt, or let an automated tool act on it. And because it’s declared in pyproject.toml, anyone who clones your project gets the same version of the same tool automatically.
This distinction, editor integration vs. command-line access, shows up throughout the Python ecosystem. DuckDB follows the same pattern: you installed the CLI (via brew or winget) for interactive SQL work, then added the duckdb Python library (via uv add) for scripting. Different installation, different interface, same engine underneath.
From this module onward, the command-line versions become essential. When you work with AI coding tools, the agent needs to run uv run ruff check and read the output to fix your code. When you set up continuous integration, the CI server runs uv run basedpyright to verify types on every push. The editor version is convenient; the project dependency is what makes the tool part of your engineering workflow.
22.3.4 Beyond the Project: uv tool install
Everything so far has been scoped to a single project. Dependencies live in pyproject.toml, packages install into the project’s .venv/, and uv run ensures you’re always working inside that environment. This is the right model for real work, and it’s what you should default to.
But sometimes you want a tool that isn’t tied to any particular project. Maybe you want to open Marimo to sketch out an idea before you’ve even run uv init. Maybe you want a CLI utility available system-wide without adding it to every pyproject.toml. That’s what uv tool install is for.
terminal
uv tool install marimoThis creates an isolated virtual environment just for Marimo in a central location (~/.local/share/uv/tools/marimo/ on macOS and Linux), then symlinks the marimo command into ~/.local/bin/ so it’s on your PATH. You can now run marimo edit from any directory, no project required. Each tool gets its own isolated environment, so Marimo’s dependencies can never conflict with another tool’s.
If you’ve used R, this model might feel familiar. R packages install into a user-level library that’s available regardless of which project directory you’re in. uv tool install works the same way, except each tool gets its own isolated environment rather than sharing a single library.
The important distinction is persistence. uvx marimo edit (which you may see in Marimo’s documentation) creates a temporary, cached environment that gets cleaned up. uv tool install marimo creates a permanent one that survives across sessions. If you want to experiment with Marimo regularly without tying it to a project, the installed version is more convenient.
You can even install extras:
terminal
uv tool install marimo[recommended]This gives you Marimo plus its recommended set of visualization and data libraries, available globally.
If you have both a globally installed tool (uv tool install ruff) and a project dependency (uv add --dev ruff), running uv run ruff inside a project uses the project’s version. The global install is a fallback for when you’re outside any project. This means you can’t accidentally use the wrong version in a project that specifies one.
For this book, we keep everything in the project with uv add. This makes your work reproducible and your tools versioned. But knowing that uv tool install exists gives you an escape hatch for quick experimentation outside of a project context.
22.3.5 [project.scripts]
This section defines CLI entry points, commands that become available after installation. The line northwind = "northwind_analysis.cli:app" means: when someone runs the northwind command, execute the app object from the northwind_analysis.cli module. You’ll build this in Chapter 26.
You have this pyproject.toml snippet:
[project]
name = "data-processor"
version = "0.2.0"
requires-python = ">=3.12"
[project.dependencies]
polars = ">=0.20"
duckdb = "~=0.9"
requests = ">=2.28"
[dependency-groups]
dev = [
"pytest>=7.0",
"ruff>=0.1",
]
[project.scripts]
process = "data_processor.cli:main"Answer these questions: 1. What is the minimum Python version required? 2. What exact range of Duckdb versions are acceptable? 3. What command becomes available after installation, and which Python function does it call?
- Python 3.12 or higher (
>=3.12) - DuckDB >=0.9 and <1.0 (the
~=0.9specifier allows only patch-level updates) - The
processcommand calls themainfunction in thedata_processor.climodule
22.3.6 [build-system]
The build system tells uv (and other tools) how to package your project for distribution. Hatchling is a modern, fast build backend that works well with uv. You rarely need to change this section.
22.4 Dependency Management
22.4.1 Adding and Removing Dependencies
terminal
# Add a runtime dependency
uv add polars
# Add a development dependency
uv add --dev ruff
# Remove a dependency
uv remove xlsxwriterEach uv add command updates pyproject.toml and regenerates the lock file.
22.4.2 The Lock File
When you run uv add or uv lock, uv creates (or updates) a uv.lock file. This file records the exact version of every package that was installed, including transitive dependencies (the dependencies of your dependencies).
The lock file solves the “it works on my machine” problem. When a collaborator runs uv sync, uv reads the lock file and installs the exact same versions you used. No version mismatches, no surprises.
uv.lock to Git
The uv.lock file belongs in your Git repository. It’s what makes your project reproducible. Without it, uv sync resolves dependencies fresh, potentially picking different versions than you tested with.
22.4.3 Syncing an Environment
terminal
# Install all dependencies (runtime + dev) from the lock file
uv sync
# Install only runtime dependencies
uv sync --no-devAfter cloning a project, uv sync is the first command you run. It creates a virtual environment, installs all dependencies at the locked versions, and makes everything ready to use.
22.4.4 The Universal Command: uv run
Throughout this book, you’ve used uv run to execute scripts. What uv run actually does is ensure the virtual environment exists and is synced, then run the command inside that environment. You never need to manually activate a virtual environment:
terminal
# These all work without manual environment activation
uv run python script.py
uv run ruff format
uv run basedpyright
uv run marimo edit notebook.py
uv run quarto render report.qmdThis is one of uv’s best features. In other Python workflows, you have to remember to activate the environment before running commands. With uv run, the environment management is automatic.
22.5 Publishing and Distribution
22.5.1 Publishing to PyPI
PyPI, the Python Package Index, is the central repository where Python packages are published and installed from. When you run uv add polars, uv downloads Polars from PyPI. You can publish your own packages there too.
The workflow is:
terminal
# 1. Build your package (creates dist/ with wheel and sdist files)
uv build
# 2. Publish to Test PyPI (for practice)
uv publish --publish-url https://test.pypi.org/simple/
# 3. Publish to real PyPI (when ready)
uv publishBefore publishing, ensure your pyproject.toml has complete metadata: name, version, description, author, license, and a README. These appear on the package’s PyPI page and help users decide whether to install it.
Test PyPI is a separate instance of the package index meant for experimentation. Publishing there doesn’t affect the real PyPI and lets you practice the workflow without consequences. Create an account at test.pypi.org and try publishing a small package.
22.5.2 Installing from GitHub
Not every project needs to be on PyPI. For sharing code with teammates or classmates, you can install directly from a GitHub repository:
terminal
# Install from a GitHub repo
uv add git+https://github.com/username/northwind-analysis
# Install a specific tag or branch
uv add git+https://github.com/username/northwind-analysis@v0.1.0This is useful for projects that are actively developed or not ready for a public release. Push to GitHub, share the URL, and your collaborator can install it with a single command.
22.6 The Complete Project
A professional Python project includes more than source code. Here’s the full inventory of files and their purposes:
| File | Purpose | Introduced |
|---|---|---|
pyproject.toml |
Project metadata, dependencies, tool config | This chapter |
uv.lock |
Exact dependency versions for reproducibility | This chapter |
.python-version |
Pins the Python version | Chapter 12 |
ruff.toml |
Formatting and linting configuration | Chapter 23 |
pyrightconfig.json |
Type checking configuration | Chapter 23 |
.gitignore |
Files Git should ignore | Foundations |
README.md |
Project documentation | This chapter |
src/ |
Source code (for --package projects) |
This chapter |
notebooks/ |
Marimo notebooks | Appendix I |
data/ |
Data files (database, CSV, Parquet) | Chapter 16 |
22.6.1 Writing a README
The README is the front door to your project. It’s the first thing a visitor sees on GitHub, and it determines whether they keep reading or move on. A good README answers four questions:
What does this project do? A brief description of purpose and scope. One or two sentences is enough.
Why does it exist? Context that helps the reader understand the motivation. For a personal project, this might reference what problem it solves or what dataset it analyzes.
How do I install it? Step-by-step instructions that a reader can follow without asking you for help. For a uv-managed project, this is usually uv sync.
How do I use it? Examples of running the main commands or scripts. Show the exact terminal commands and, if helpful, sample output.
Here’s a template:
README.md
# Northwind Analysis
Analytical workflows for the Northwind trading database, built with
DuckDB, Polars, Altair, and Python. Generates revenue reports, trend
visualizations, and formatted Excel deliverables.
## Installation
```bash
git clone https://github.com/username/northwind-analysis.git
cd northwind-analysis
uv sync
```
## Usage
Generate a revenue report:
```bash
uv run northwind revenue --category "Beverages" --output report.xlsx
```
Open the interactive analysis notebook:
```bash
uv run marimo edit notebooks/analysis.py
```
## Project Structure
```{.text filename="output"}
src/northwind_analysis/ Source code and modules
notebooks/ Marimo analysis notebooks
data/ Northwind database and flat files
output/ Generated reports
```Write the README for your audience. If it’s a learning project, your audience is collaborators or future employers who have sixty seconds to decide whether your work is worth a closer look.
22.6.2 Git Hygiene for Python Projects
Not everything in your project directory belongs in Git. Generated files, caches, and environment directories should be ignored:
.gitignore
# Python
__pycache__/
*.pyc
*.pyo
.venv/
# Tools
.ruff_cache/
.mypy_cache/
# OS
.DS_Store
Thumbs.db
# Project
output/
*.xlsx
What should be in Git: pyproject.toml, uv.lock, all source code, notebooks, configuration files (ruff.toml, pyrightconfig.json), data files that are small enough to version (the Northwind .duckdb file), and the README.
Revisiting a principle from Foundations: write commit messages that explain why you made a change, not just what changed. “Add Polars dependency for DataFrame transformations” is more useful than “update pyproject.toml.” Your future self and your collaborators will thank you.
22.6.3 Verifying Reproducibility
The ultimate test of a well-engineered project is this: can someone else reproduce your entire environment from the repository alone?
terminal
# The reproducibility test
rm -rf .venv/ # Delete the virtual environment
uv sync # Recreate it from the lock file
uv run python -c "import northwind_analysis; print('Success!')"If this works, your project is reproducible. If it fails, something is missing from your pyproject.toml or your lock file is out of date.
You committed your project to GitHub with these entries in pyproject.toml:
[project.dependencies]
polars = ">=0.19"
duckdb = ">=1.0"A collaborator clones your project and runs uv sync. They get Polars 0.20.3 and DuckDB 1.2.1, but the uv.lock file you committed recorded Polars 0.19.5 and DuckDB 1.0.8. Now your collaborator’s analysis produces different results than yours, and you both spend an hour debugging before realizing the version difference.
How does the lock file prevent this? What is the lock file’s job, and why should you always commit it to Git?
The lock file records the exact versions that were installed when you ran uv add or uv lock. When your collaborator runs uv sync, they install from the lock file, not from pyproject.toml. This means they get Polars 0.19.5 and DuckDB 1.0.8, exactly matching your environment.
The lock file’s job is reproducibility. The pyproject.toml expresses your intent (any Polars >= 0.19 will work), while the lock file records the reality (we tested with 0.19.5). When the same versions are installed everywhere, you avoid version-related surprises.
Always commit uv.lock to Git because it’s what makes a project reproducible across machines and time. Without it, each person running uv sync might get different dependency versions, leading to inconsistent behavior.
Think of it like assembly instructions for a product. The pyproject.toml lists the parts (dependencies). The uv.lock specifies the exact part numbers (versions). The README provides the step-by-step assembly guide. If any of these are missing or inaccurate, the person on the other end can’t build what you built.
You’re starting a new data analysis project. You need Jupyter-like interactivity but also DuckDB and Polars for the analysis itself. You’re torn between two approaches:
Option A: uv init --package the project, then uv add marimo duckdb polars
Option B: Keep marimo as a global tool with uv tool install marimo, and uv add duckdb polars only to the project
Which approach is correct, and why? How would your choice affect a collaborator who clones your project?
Option A is correct. Marimo should be a project dependency because your notebooks are part of the project’s deliverable. A collaborator who clones your project and runs uv sync should get marimo automatically, so they can open and run your notebooks without additional installation steps.
Option B would leave marimo out of the project’s environment. A collaborator could clone and sync, but then uv run marimo edit would fail unless they separately installed marimo as a global tool.
The principle: if a tool is needed to use your project, it belongs in pyproject.toml as a regular dependency (or in [dependency-groups] if it’s only needed by developers). Global tools via uv tool install are for personal utilities that aren’t tied to specific projects.
Exercises
Three Modes
Create three separate projects using uv init, uv init --package, and uv init --library. Compare the generated files and directory structures. Write a short note in each project’s README explaining when you would choose that mode.
Northwind pyproject.toml
Write a complete pyproject.toml for your Northwind analysis project. Include all runtime dependencies (DuckDB, Polars, Altair, XlsxWriter, Typer) and development dependencies (Ruff, basedpyright, Marimo). Set appropriate version specifiers and fill in all metadata fields.
Publish to Test PyPI
Create a small utility package (it can be as simple as a module with a few helper functions), publish it to Test PyPI, then install it in a separate project with uv add --index-url https://test.pypi.org/simple/ your-package-name. Verify that the installed package works.
Peer Installation
Push your Northwind project to GitHub. Have a colleague install it with uv add git+https://github.com/your-username/northwind-analysis. If it fails, debug the issue together and fix the project configuration until it works.
Template Repository
Create a personal template repository on GitHub with your preferred project structure, .gitignore, ruff.toml, and pyrightconfig.json. Use it the next time you start a project to save setup time.
Reproducibility Check
Delete your project’s .venv/ directory, run uv sync, and confirm that everything works. Run your main script and your notebook to verify. If anything breaks, fix the pyproject.toml until the project is fully reproducible from the lock file alone.
README Review
Write a README for your Northwind project that is detailed enough for someone unfamiliar with it to set up and use the project from scratch without asking you any questions. If possible, trade READMEs with someone else and attempt to follow each other’s instructions exactly. Note where the instructions were unclear and revise.
Summary
Project engineering is the discipline of making your work reproducible and shareable. uv manages the full project lifecycle: initializing projects with the right structure, declaring dependencies in pyproject.toml, locking exact versions in uv.lock, syncing environments with uv sync, and distributing packages via PyPI or GitHub. The three uv init modes, simple, package, and library, provide the right starting structure for different project types.
A professional project is more than working code. It includes a pyproject.toml with complete metadata and dependencies, a uv.lock for reproducibility, a .gitignore that keeps generated files out of version control, and a README that guides collaborators through installation and usage. The test of quality is simple: can someone else clone your repository and reproduce your results with uv sync?
Every subsequent chapter in this module builds on the project structure established here. Quarto documents render within the project environment. Ruff and basedpyright are configured in the project’s configuration files. The CLI tool you build in Chapter 26 is registered as an entry point in pyproject.toml. Get the project structure right, and everything else has a place to live.
Glossary
- build backend
-
The tool that packages your Python code for distribution. Hatchling is the default for
uvprojects. The build backend is specified in[build-system]. - dependency group
-
A named set of dependencies for a specific purpose, like
devfor development tools. Groups are declared in[dependency-groups]and installed withuv sync. - entry point
-
A command that becomes available when a package is installed. Defined in
[project.scripts]as a mapping from command name to a Python function. - lock file
-
A file (
uv.lock) that records the exact version of every installed package. Ensures reproducible installations across different machines and times. pyproject.toml- The standard configuration file for Python projects. Contains metadata, dependencies, tool configuration, and build instructions.
- PyPI
-
The Python Package Index, the central repository where Python packages are published. Packages are installed from PyPI by default when you run
uv add. - sdist
-
A source distribution, a packaged archive of your project’s source code. Built with
uv build. - semantic versioning
-
A versioning scheme where version numbers follow the pattern
MAJOR.MINOR.PATCH. Major changes break compatibility, minor changes add features, patch changes fix bugs. uv tool install- Installs a Python CLI tool into a central, isolated virtual environment so it’s available system-wide without being tied to any project. Useful for tools you want to access outside a project context.
- virtual environment
-
An isolated Python installation with its own set of packages.
uvmanages virtual environments automatically in the.venv/directory. - wheel
- A built distribution, a pre-compiled package ready for installation. Faster to install than an sdist because no build step is needed.