Database design for Google Calendar, pt. 1


Hello and welcome to the first post in the newsletter for the book "Database Design using Minimal Modeling".

Introduction

In this series of posts I’m going to show how you can build a database model for a real-world project, using a systematic approach.

We’ll be designing tables for a Google Calendar clone. We will model as much as possible of the functionality that is directly related to the calendar.

This series illustrates an approach explained in the book called “Database Design using Minimal Modeling”, scheduled to be released in Spring 2024. Here is the website of the book: https://databasedesignbook.com/. You can subscribe to receive updates on the book and to get notified of new posts in this series, and also other material.

We will first build a complete logical model that describes the calendar data to be stored. This should take the biggest part of the series. After the logical model is done, we’ll build table design directly based on the logical model.

I’m writing the text as it goes, so it’s possible that we’ll change some of the modeling decisions later. That will demonstrate how the model handles mistakes, improvements and changing requirements.

Problem description

We’re going to implement a big part of Google Calendar functionality. Some parts we’ll skip, but we’ll try and implement every feature of calendaring, because it’s the main area. Some areas we’ll implement just enough to be able to discuss the central parts.

Google Calendar is a multi-user system. You can invite other people to the calendar events. We’re going to describe only a minimum functionality of the users data.

Events are the central part of Google Calendar, and we’re going to design them as closely as possible to the real thing.

For starters, events have title and description, as well as some other minor attributes such as location.

Here is the design space of calendar events with regard to time and date:

1/ All-day events vs time-based.

2/ All-day events can spread over multiple days;

3/ Time-based events:

  • Can have associated time zone;
  • Have begin and end time;
  • Begin and end time can happen on different days;
  • Begin and end time can be in different timezones;

4/ Finally, both all-day and time-based events:

  • Can be repeated daily, or every N days;
  • Can be repeated weekly, on certain days of the week; again, it can be every two or more weeks;
  • Can be repeated monthly, on a certain day or day of the week;
  • Can be repeated annually;
  • Repeating events can go on forever, until a certain date, or for a certain number of repetitions;

If you have a repeating event, you can move some events to a different date/time. You can also delete some of the repeating events, leaving the rest intact.

You can change the schedule of the repeated event. For example, you can switch from two project meetings every week on Tuesday and Thursday to one meeting every two weeks, on Fridays.

Here is a screenshot of day event editing form:

Part 1: Basic all-day events

Anchors

First thing we need is to find some so-called anchors. Anchors are also known as entities. Anchors are usually nouns, such as User and Event.

Anchors are extremely obvious in simple cases, but may get tricky in non-obvious cases. We’re going to write down everything, no matter how obvious it seems.

First two anchors that come to mind are:

There is no data in the anchors. Basically the only thing that anchors handle is IDs and counting.

So, for example, in our tables there would be a User with, say, ID=23, and DayEvent with ID=100, etc.

We won’t be dealing with the last column for now, we’ll discuss the physical model later in the series.

To validate that we have an anchor, we can construct a sentence about it, and if this sentence makes sense then this should be an anchor. Example sentences:

  • There are 200 Users in our database”;
  • When this form is submitted, a new User is added to the database”;

Same for DayEvent:

  • There are 3000 DayEvents in our database”;
  • When this button is clicked, a new DayEvent is created”;

Such sentences will be useful later in more complicated cases.

Attributes of User

Attributes store the actual information about anchors.

Which data about users should we model? Users are ubiquitous, and different systems may want to store a lot of information about particular users. For this series, we’re going to model the bare minimum of data: emails.

What can we see here?

  • This attribute belongs to the User anchor that was defined in the previous section;
  • We use questions to describe all sorts of attributes. Later on we’ll discuss why we prefer this style over “User’s email” and such;
  • Logical type is quite simple. If you expected “VARCHAR(128)” here or something like that: we’ll discuss this much later.
  • We show an example value that helps us to confirm our thinking. Again, in simple cases this is very obvious, but it would help reviewers to confirm that everyone is on the same page.
  • We won’t be dealing with two last columns for now, we’ll discuss the physical model later in the series.

We’re going to see more examples of logical types later in the series. We extensively discuss the logical types in the book.

We can skip ahead a little bit, and show how the data about the users looks like. We’ll be using a simple strategy to design physical tables, so the result is going to be completely unsurprising:

This is just to show the part of the final result so that you know where we’re heading to. Here, a user with ID=2 has email “cjdate@example.org”, and a user with ID=3 has email “someone@else.com”.

Other than that, we won’t be talking much about the users and focus almost entirely on calendaring functionality.

Attributes of DayEvent

Suppose that we want to record that there is a two-day company retreat that begins on January 14th, 2024. In terms of anchors, this is going to be a DayEvent.

Looking at the paragraph above, we can see that we need to store the following data about the DayEvents:

  • Name of the event;
  • The begin date and end date of the event.

Let’s write that down in our table:

What can we see here?

  • We defined the first three attributes;
  • We don’t have any short names for the attributes, and this may bother us a little. We’d expect to have something like “DayEvent_name” or some other identification to refer to the attribute later in the text. We’ll return to this topic later.
  • We have a new logical type: date. We won’t have to deal with timezones in this introductory post.
  • Probably for most of the events in actual calendars the begin date and end date are the same (most events are single-day). We’ll just store the same date in both attributes. This allows us to treat the special case (single-day events) as the general case (multi-day events). This is a general design strategy, but we’re going to investigate if this line of thinking is always applicable.

Links

Where do we store the information that this particular user has created this particular DayEvent? On the first glance this may look like an attribute of DayEvent, right? (Actually, no).

Attributes cannot contain IDs. Instead, when two anchors are involved, we’re going to use links.

Links may look unexpectedly complicated, but I hope to demonstrate that links help to pin down the correct design for many complicated scenarios. What do we see in this table?

  • Link connects two anchors. Skipping ahead, we can have both different anchors and the same anchor. For example, “Employee is a manager of Employee” is one such link.
  • We use a weird notation with the “<” character (other possibilities are: “”, “=”, and “>”). It is the same as “1:N”, but additionally it allows you to specify which is “1” and which is “N”.
  • We use formalized sentences that involve two anchors, a verb, and information about cardinality.
  • We always use two formalized sentences, explaining both directions of the link. Later we’ll see how it helps with the design.
  • The sentences allow us to double-check and document our design.
  • We write down the cardinality again, in a familiar way in a separate column. Pinning down cardinality is essential, so we make a lot of fuss about it.

(We’ll discuss the links much more, of course).

Physical model

Again skipping ahead: if we would have stopped right now, and tried to write down the physical design for the schema that we have so far, here is what we’d see. This is just to confirm that we’re heading in a familiar direction:

Conclusion

This was a first peek into the systematic database design process, as explained in https://databasedesignbook.com/. Subscribe to get notified of new posts in this series, book updates, and other material.

We started with a very simple subset of the task at hand (Google Calendar clone). We built three simple lists in a tabular form: list of anchors, attributes and links. We’re going to continue expanding those lists as we handle more requirements for our calendaring system.

In the next post we’ll see how the current schema extends to handle time-based events. The draft plan of the series is:

  • Time-based events;
  • Repeated events;
  • “What should happen” vs “What actually happened” pattern;
  • Overriding some events from the repeated series;
  • Physical schema building;

If you have any questions or some material was confusing or incomplete, feel free to contact me via email. If you know anybody who would be interested in learning database modeling, send them this email or a link to the https://databasedesignbook.com/.

Minimal Modeling

Minimal Modeling: making sense of your database

Read more from Minimal Modeling

Previously in the series This is the final post in the “Modeling Google Calendar” series. Previously: https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-1 for introduction, problem description and the first part, “Basic all-day events”; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-2 “Time-based events”, and some discussion of timezones; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-3 “Repeated all-day...

Previously in the series This is the fifth post in the “Modeling Google Calendar” series. Previously: https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-1 for introduction, problem description and the first part, “Basic all-day events”; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-2 “Time-based events”, and some discussion of timezones; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-3 “Repeated all-day...

Introduction This is the fourth post in the “Modeling Google Calendar” series. Previously: https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-1 for introduction, problem description and the first part, “Basic all-day events”; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-2 “Time-based events”, and some discussion of timezones; https://databasedesignbook.com/posts/database-design-for-google-calendar-pt-3 Repeated all-day events; This...