
https://xkcd.com/327/
Let’s return to our example world database introduced in Week 1. Recall that we had three tables: country, city, and countrylanguage.
Each of these tables has a column (or set of columns) that uniquely identifies each row of that table:
code from countryid from citycountry_code and language from countrylanguageThese special columns are called primary keys.
Primary keys have a few requirements:
NULL.code in the country table).Sometimes, having a single column uniquely identify a row is not enough. In this case, multiple columns can be used together to form a composite primary key. That’s the scenario we see in the countrylanguage table:
country_code could not be a primary key)language could not be a primary key)country_code and language is usefulLet’s reexamine the city table:
-- +----+----------------+-------------+---------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +----+----------------+-------------+---------------+------------+
-- | 1 | Kabul | AFG | Kabol | 1780000 |
-- | 2 | Qandahar | AFG | Qandahar | 237500 |
-- | 3 | Herat | AFG | Herat | 186800 |
-- | 4 | Mazar-e-Sharif | AFG | Balkh | 127800 |
-- | 5 | Amsterdam | NLD | Noord-Holland | 731200 |
-- | 6 | Rotterdam | NLD | Zuid-Holland | 593321 |
-- | 7 | Haag | NLD | Zuid-Holland | 440900 |
-- | 8 | Utrecht | NLD | Utrecht | 234323 |
-- | 9 | Eindhoven | NLD | Noord-Brabant | 201843 |
-- | 10 | Tilburg | NLD | Noord-Brabant | 193238 |
-- (...)
As mentioned earlier in Week 1 - Combining Tables, the values of city.countrycode match the values of country.code (the primary key for country).
When the primary key for one table appears in a different table, it is called a foreign key.
So code is a primary key when it appears in country, but a foreign key when it appears in city. The most common types of joins will be joining a foreign key from one table with the primary key from another table.
[!IMPORTANT] When a column is set as a foreign key, its value must exist in the original table as a primary key. For example, inserting a row with
countrycode = 'XXX'into thecitytable is not possible because there’s no country with code ‘XXX’ in thecountrytable.
[!NOTE] In the
citytable, the values for columnnameare also unique, so it could also be used as a primary key. However, generally, the primary key will just be calledid.
UNIQUE, NOT NULL, CHECK, and DEFAULTPrimary keys and foreign keys are the so-called table-level constraints. There are other constraints that apply only to the values of individual columns.
Here are some of the column constraints that can be set:
UNIQUE columns have a different value for every row. This is similar to a primary key except a table can have many different UNIQUE columns.
country.name can be UNIQUE (there shouldn’t be 2 countries with the same name), even though code is the primary keyNOT NULL columns must have a value. Attempts to insert a row without a value for a NOT NULL column will result in a constraint violation and the new row will not be inserted.
population of a country is important information and we’d like to prevent having rows that don’t set itCHECK columns allow checking for a condition
country.population must be greater than 0DEFAULT columns are configured to have a given value automatically set if the new row does not specify a value for that column.
isOfficial to true (‘T’) whenever a new row is added to countrylanguage, if a value is not specified[!NOTE] We’ll talk more about how to set these constraints in the “Creating your first database” section.
In this section, we will learn how to design our own database schemas.
Thus far, we worked with a finished world database. Now, we will look under the hood and see what steps can be used to create such a database.
We are tasked with representing a book review website through a database schema. This includes the books, authors, translators, publishers, and book ratings.
To break down the question further, we need to decide…
Let us look at some different ways in which tables can be related to each other in relational databases.
Consider this case, where each author writes only one book and each book is written by one author. This is called a one-to-one relationship.

On the other hand, if an author can write multiple books, the relationship is a one-to-many relationship.

But actually, neither of those reflect reality. Not only can one author write multiple books, but books can also be co-written by multiple authors. This is a many-to-many relationship.

[!IMPORTANT] There are 3 types of relationships in the relational model:
- one-to-one relationships
- one-to-many relationships
- many-to-many relationships
We just described one-to-one, one-to-many, and many-to-many relationships between tables in a database. It is possible to visualize such relationships using an entity relationship (ER) diagram.
ER diagrams are a useful way to design and explain the conceptual model for a database before creating the database itself. ER diagrams are easy to understand even without a technical background.
Also, it’s relatively simple to convert ER diagrams to the Relational Model (that is, to a SQL representation). Therefore, when designing a relational database, it’s common to start by mapping an ER diagram.
Here’s an ER diagram for the book review problem:

Each entity (Author, Publisher, etc.) will be a table in our database. The relationship between the tables, or entities, is represented by the verbs that mark the arrows connecting entities.
The diagram is following the UML notation (simplified):
0 - no relations1 - one relationn - many relationsThese can be used together to represent the 3 types of relationships:
n -> 1
n -> 0..11 <-> 1
0..1 <-> 0..1n <-> n
1..n <-> nHere are some examples:

Let’s revisit the ER diagram for the book review problem:

By observing the lines connecting the Book and Translator entities, we can say that books don’t need to have a translator (n <-> 0..n). However, a translator in this database translates at least one book, and possibly many.
Let’s see if you got it! Try to answer the following questions:
[!TIP] The exact relationships between entities are up to the designer of the database. For example, whether each author can write only one book or multiple books is a decision to be made while designing the database. An ER diagram can be thought of as a tool to communicate these decisions to someone who wants to understand the database and the relationships between its entities.
Once we know that a relationship exists between certain entities, we need to implement that in our database. For that, we’ll use primary keys and foreign keys.
Let’s start with primary keys. For each entity/table, we’ll assign them a unique id column. Here’s an example books table:

Now we need to map the relationships. We do that with the help of foreign keys.
For each type of relationship (one-to-many, one-to-one, many-to-many) there’s a way to convert them to the relational model (a SQL table).
To represent the relationship between Books and Ratings, we add the primary key of books as a column in the ratings table. This helps form a one-to-many relationship between the two tables — a book with a title (found in the books table) can have multiple ratings (found in the ratings table).
Here’s the books and ratings table:

[!IMPORTANT]
idis a primary key inbooks, butbook_idis a foreign key inratings.
To represent the relationship between Books and Authors, we create a separate table - authored. This table maps the primary key of books - book_id with the primary key of authors - author_id.
This helps form a many-to-many relationship between the two original tables - an author (found in the authors table) can write multiple books (found in the books table), represented in the authored table by the multiple entries with their author_id. Simultaneously, a book can be written by multiple authors, found in the authored table as multiple entries with the same book_id.
Finally, author_id and book_id together form the (composite) primary key of the authored table. This uniquely identifies the relationship and ensures that an author can’t be related to the same book more than once, and vice versa.
Here’s the books, authors, and authored tables:

[!TIP] Tables like
authoredare called “joint” or “junction” tables.
[!NOTE] There is no one-to-one relationship in our ER diagram. See the Appendix for an example of how to store such a relationship in your tables.
It’s your turn!
Based on the ER diagram, design the tables to represent the relationship between Books and Publishers. In this exercise, for the Publishers, we just need to store their name.
Hint - you will need:
publishers tablebooks table[!NOTE] The process of separating our data in this manner is called normalizing. When normalizing, we put each entity in its own table—as we did with authors, books, and others. Any information about a specific entity, for example an author’s name, goes into the entity’s own table.
So far we’ve been focusing on the relationships between our tables. That’s a purposeful choice - relationships are hard to change once they’re in place, so it’s important to get them right and make sure they match our problem.
However, we also need to think about what data we’re storing in our tables - data types - and to think of what restrictions we want to impose on them - constraints.
SQL supports a range of data types across widely used classes of data, such as the following:
These data types are found across all flavors of SQL (for example SQLite, and PostgreSQL). However, some versions of SQL may support several distinct data types of a particular class while others may only have one.
For example, MySQL has the following options to just represent integer values (numbers without decimal points): INTEGER, INT, SMALLINT, TINYINT, MEDIUMINT, BIGINT [3].
For now, we’ll use the most flexible options:
INTEGER for integers (for example - 21), DECIMAL for numbers with decimal points (for example - 21.02)TEXTDATE for dates (2024-06-13), DATETIME for dates with time (2024-06-13T20:59:56.756Z)For example, we could store:
author.name as TEXTratings.score as INTEGERbook.published_on as DATEAfter we’ve decided on which data types to use for all columns, we need to think about how to restrict them using constraints. We usually do this according to what makes sense in our domain.
For example, we can set:
book.isbn as UNIQUE (no 2 books have the same ISBN) and NOT NULL (a book must have an ISBN)book.published_on with a DEFAULT of “today”ratings.score with a CHECK constraint ensuring score is between 0 and 5Now it’s your turn! Let’s assume that as part of our database we want to store:
For these 3 examples, what data types and constraints (if any) would you set?
CREATE TABLE[!IMPORTANT] To follow along this section, create a new MySQL database. In DBeaver, you can do that in
Database Navigator -> world -> Databases -> Create -> Database
Give a name to your database (for example,
books) and open a new SQL console:
We now have a better idea of the schema of our tables, so we can finally go ahead and create them.
In SQL, to create a table in our database we use the CREATE TABLE command:
CREATE TABLE table_name (
column_name data_type optional_constraint,
column_name data_type optional_constraint,
-- ...
);
For example, to create the authors and books tables:
CREATE TABLE authors (
id INTEGER,
name TEXT NOT NULL,
country TEXT,
date_of_birth DATE,
PRIMARY KEY (id)
);
CREATE TABLE books (
id INTEGER,
isbn TEXT NOT NULL,
title TEXT,
pages INTEGER,
published_on DATE,
PRIMARY KEY(id)
);
Notice we’re setting the NOT NULL constraint on authors.name and setting authors.id (and books.id) as primary keys.
We can now create the authored table:
CREATE TABLE authored (
author_id INTEGER,
book_id INTEGER,
FOREIGN KEY(author_id) REFERENCES authors(id),
FOREIGN KEY(book_id) REFERENCES books(id),
PRIMARY KEY (author_id, book_id)
);
Books and authors are now connected in a many-to-many fashion, thanks to both foreign keys.
[!TIP] Running
CREATE TABLEon a table that already exists results in an error. If you make a mistake and want to recreate a table you must first drop it.DROP table books:Keep in mind this will **delete all data in the table, and its schema.
You may be wondering, what about publishers? We should also set up a one-to-many relationship between Publishers and Books, but we have already created our books table.
That is not a problem - we can alter a table’s schema via the ALTER TABLE command. First, we create the publishers table:
CREATE TABLE publishers (
id INTEGER,
publisher TEXT,
PRIMARY KEY(id)
);
Now, we can alter the books table to include its publisher - publisher_id, via a foreign key to publishers.
ALTER TABLE books ADD COLUMN publisher_id INTEGER;
ALTER TABLE books ADD FOREIGN KEY (publisher_id) REFERENCES publishers(id);
It’s your turn!
Write the CREATE TABLE command to create the ratings table. Remember to ensure the rating amount is valid. Hint: here’s how to create a CHECK constraint.
To see the complete schema definition including some example data, check the databases/books.sql.
[!TIP] To import the file into your MySQL database, run the following while at the root of the repository:
$ mysql -u "<your-username>" -p < "week2/databases/books.sql" # enter your password when asked
You just learned how to create your first database in a way that matches a real world problem.
In summary:
UNIQUE: Ensures all values in a column are differentNOT NULL: Ensures a column cannot have a NULL valueCHECK: Ensures values in a column meet a specific conditionDEFAULT: Sets a default value for a column if none is providedCREATE TABLE and ALTER TABLE commands if neededThat’s it for Week 2’s theoretical content. Next, you should focus on the exercises for this week.
[1] Adapted from Codecademy’s “Learn SQL” course
[2] Adapted from CS50’s Introduction to Databases with SQL
[3] https://dev.mysql.com/doc/refman/8.4/en/integer-types.html
TOO add one-to-one example