Hello and welcome to the first post in the newsletter for the book "Database Design using Minimal Modeling".
In this series of posts I’m going to show how you can build a database model for a real-world project, using a systematic approach.
We’ll be designing tables for a Google Calendar clone. We will model as much as possible of the functionality that is directly related to the calendar.
This series illustrates an approach explained in the book called “Database Design using Minimal Modeling”, scheduled to be released in Spring 2024. Here is the website of the book: https://databasedesignbook.com/. You can subscribe to receive updates on the book and to get notified of new posts in this series, and also other material.
We will first build a complete logical model that describes the calendar data to be stored. This should take the biggest part of the series. After the logical model is done, we’ll build table design directly based on the logical model.
I’m writing the text as it goes, so it’s possible that we’ll change some of the modeling decisions later. That will demonstrate how the model handles mistakes, improvements and changing requirements.
We’re going to implement a big part of Google Calendar functionality. Some parts we’ll skip, but we’ll try and implement every feature of calendaring, because it’s the main area. Some areas we’ll implement just enough to be able to discuss the central parts.
Google Calendar is a multi-user system. You can invite other people to the calendar events. We’re going to describe only a minimum functionality of the users data.
Events are the central part of Google Calendar, and we’re going to design them as closely as possible to the real thing.
For starters, events have title and description, as well as some other minor attributes such as location.
Here is the design space of calendar events with regard to time and date:
1/ All-day events vs time-based.
2/ All-day events can spread over multiple days;
3/ Time-based events:
4/ Finally, both all-day and time-based events:
If you have a repeating event, you can move some events to a different date/time. You can also delete some of the repeating events, leaving the rest intact.
You can change the schedule of the repeated event. For example, you can switch from two project meetings every week on Tuesday and Thursday to one meeting every two weeks, on Fridays.
Here is a screenshot of day event editing form:
First thing we need is to find some so-called anchors. Anchors are also known as entities. Anchors are usually nouns, such as User and Event.
Anchors are extremely obvious in simple cases, but may get tricky in non-obvious cases. We’re going to write down everything, no matter how obvious it seems.
First two anchors that come to mind are:
There is no data in the anchors. Basically the only thing that anchors handle is IDs and counting.
So, for example, in our tables there would be a User with, say, ID=23, and DayEvent with ID=100, etc.
We won’t be dealing with the last column for now, we’ll discuss the physical model later in the series.
To validate that we have an anchor, we can construct a sentence about it, and if this sentence makes sense then this should be an anchor. Example sentences:
Same for DayEvent:
Such sentences will be useful later in more complicated cases.
Attributes store the actual information about anchors.
Which data about users should we model? Users are ubiquitous, and different systems may want to store a lot of information about particular users. For this series, we’re going to model the bare minimum of data: emails.
What can we see here?
We’re going to see more examples of logical types later in the series. We extensively discuss the logical types in the book.
We can skip ahead a little bit, and show how the data about the users looks like. We’ll be using a simple strategy to design physical tables, so the result is going to be completely unsurprising:
This is just to show the part of the final result so that you know where we’re heading to. Here, a user with ID=2 has email “cjdate@example.org”, and a user with ID=3 has email “someone@else.com”.
Other than that, we won’t be talking much about the users and focus almost entirely on calendaring functionality.
Suppose that we want to record that there is a two-day company retreat that begins on January 14th, 2024. In terms of anchors, this is going to be a DayEvent.
Looking at the paragraph above, we can see that we need to store the following data about the DayEvents:
Let’s write that down in our table:
What can we see here?
Where do we store the information that this particular user has created this particular DayEvent? On the first glance this may look like an attribute of DayEvent, right? (Actually, no).
Attributes cannot contain IDs. Instead, when two anchors are involved, we’re going to use links.
Links may look unexpectedly complicated, but I hope to demonstrate that links help to pin down the correct design for many complicated scenarios. What do we see in this table?
(We’ll discuss the links much more, of course).
Again skipping ahead: if we would have stopped right now, and tried to write down the physical design for the schema that we have so far, here is what we’d see. This is just to confirm that we’re heading in a familiar direction:
This was a first peek into the systematic database design process, as explained in https://databasedesignbook.com/. Subscribe to get notified of new posts in this series, book updates, and other material.
We started with a very simple subset of the task at hand (Google Calendar clone). We built three simple lists in a tabular form: list of anchors, attributes and links. We’re going to continue expanding those lists as we handle more requirements for our calendaring system.
In the next post we’ll see how the current schema extends to handle time-based events. The draft plan of the series is:
If you have any questions or some material was confusing or incomplete, feel free to contact me via email. If you know anybody who would be interested in learning database modeling, send them this email or a link to the https://databasedesignbook.com/.
Minimal Modeling: making sense of your database
Previously in the series This is the final post in the “Modeling Google Calendar” series. Previously: https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-1 for introduction, problem description and the first part, “Basic all-day events”; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-2 “Time-based events”, and some discussion of timezones; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-3 “Repeated all-day...
Previously in the series This is the fifth post in the “Modeling Google Calendar” series. Previously: https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-1 for introduction, problem description and the first part, “Basic all-day events”; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-2 “Time-based events”, and some discussion of timezones; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-3 “Repeated all-day...
Introduction This is the fourth post in the “Modeling Google Calendar” series. Previously: https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-1 for introduction, problem description and the first part, “Basic all-day events”; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-2 “Time-based events”, and some discussion of timezones; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-3 Repeated all-day events; This...