Data with SQL
In the Professional Toolkit, you learned to navigate your computer like an engineer: managing files from the command line, authoring documents with Quarto, tracking changes with Git, and collaborating through GitHub. Those skills gave you the infrastructure to work reliably. Now we need something to work on.
This unit introduces databases, the structured systems that store and organize the data you’ll spend your career analyzing. More importantly, it introduces SQL (Structured Query Language), the language that lets you ask questions of that data. SQL is one of the most enduring and widely-used languages in computing, and for good reason: it lets you express complex analytical questions in a few lines of declarative logic, then lets the database figure out the fastest way to answer them.
We’ll use DuckDB throughout this unit, an analytical database that installs in seconds and runs entirely on your laptop. You won’t need to configure servers, manage permissions, or request access from an IT department. By the end of the first chapter, you’ll be writing real queries against real data.
What You’ll Learn
Databases (7 Databases) introduces what databases are, where they fit alongside spreadsheets, and why different kinds exist. You’ll set up DuckDB, load the Northwind dataset, and write your first SELECT query.
Querying Data (8 Querying Data) teaches the anatomy of a SQL query: SELECT, FROM, WHERE, ORDER BY, LIMIT, aggregate functions, and GROUP BY. You’ll learn to filter, sort, and summarize data from a single table.
Joining Tables (9 Joining Tables) brings multiple tables together. You’ll learn about keys, relationship types, and the join operations (INNER, LEFT, RIGHT, FULL) that let you combine data across the Northwind schema.
Analytical SQL (10 Analytical SQL) introduces advanced patterns: multi-column grouping, HAVING, subqueries, Common Table Expressions (CTEs), and window functions. These are the tools that make SQL a serious analytical language, not just a data retrieval tool.
Data Modeling (11 Data Modeling) flips the perspective. Instead of querying existing databases, you’ll design and build your own, learning about normalization, dimensional modeling, ER diagrams, constraints, views, and the DML statements (INSERT, UPDATE, DELETE) that maintain data over time.
The Northwind Dataset
Throughout this unit, we’ll work with the Northwind database, a classic dataset representing a food import and export company called Northwind Traders. It has suppliers, products, customers, orders, and shippers, which makes it a natural fit for thinking about supply chain and operations problems. Every chapter opens with a business question that creates genuine need for the SQL concepts being introduced, because SQL is a tool for answering questions, not an abstract syntax to memorize.
Standard SQL First
The SQL you learn here is standard SQL, the core that works across PostgreSQL, MySQL, SQL Server, Oracle, and every other major relational database. When we use DuckDB-specific convenience features, we’ll call them out clearly and show the portable alternative. You’re learning a skill that transfers to whatever database you encounter next.
How to Learn SQL Effectively
Don’t just read the examples. For every query you encounter in this book, predict what you think the output will look like before you run it. When the result surprises you, stop and figure out why. Try modifying the query, break it on purpose, then fix it. SQL rewards experimentation, and the DuckDB UI gives you instant feedback with no risk of damaging anything. The goal isn’t to memorize syntax; it’s to build intuition for how data flows through a query so you can think and act flexibly when facing new problems.