Creating a Database from Scratch Pt. 1: Data Modeling
- Rodolfo Lima da Silva
- Apr 30, 2022
- 3 min read
Updated: Oct 20, 2022
Note: Full Github repository with code and files used in the project is available here
1. Introduction
Information management is a crucial part of any company or institution that aims to scale up its business and operations. That's when a database system comes into play. However, before opening your SQL terminal to code or use specific software to generate a database, there are frameworks to design a database that will make your life easier in case you are required to do the job. It also works perfectly with any type of tabular data structure like a set of tables in CSV or spreadsheets.
2. Understanding The Business
First of all, understanding the business should ever be your first step to ensure the project will imbibe business requirements and expectations.
The project that I will show you is part of a digital transformation headed by me in a non-profit organization whose goal is to make education more democratic. The organization’s name is Grupo Pró-Estudar (GPE), and it offers preparatory courses for entrance exams in Brazilian universities for low-income individuals.
The main purpose here is to create a database to centralize the data we receive from our processes. We receive data from candidates who fill the socio-economic forms, as well as their performance in our entrance exam. Once the candidate is eligible to become our student, we track their personal information and results in tests that we apply over the year. We also would like to store information about our volunteers.
The project for modeling a database might be divided into 3 phases: Conceptual data model, Logical data model, and Physical data model.
3. Conceptual Data Model
A conceptual design is a simple schema that shows only entities of the database and their relationships. Getting a pencil and a sheet, you can do one. At this stage, it is not necessary to give many details, only entities that will become tables in the future and their connections. Follow my conceptual design below:

4. Logical Data Model
In the Logical data model stage, we are interested in listing the entities and attributes in the project (tables and columns name in the future), foreign keys, unique keys, and primary keys. We go deeply into details about the relationship between the entities and classify them by drawing technical symbols like one-to_many/many-to-many concepts. An example of logical data models is illustrated below:

If you are in doubt about why the Volunteer table is related to the student table, well, in our institution every student can have a volunteer as a mentor, so the entity volunteer and student are connected with each other.
Note: I would rather name the entities (tables) and features (columns) in the logical phase, just a personal preference.
5. Physical Data Model
In the last phase, the physical data model is built from the previous two models and consists in converting entities into tables, relationships into foreign keys, and attributes into columns. We also set the table names, column names, and data type of columns and do some changes based on project requirements and constraints. In a nutshell, we set the technical requirements and restrictions needed to start selecting a DBMS (Database Management System) and create physically the database, tables, and their features.
A physical model is the evolution of the logical model, so the logical model plus the data dictionary will work as a physical model design. A data dictionary is a document in which everybody can see all tables name, column names, the data type, if the column accepts null values or not, descriptions, and additional information, as you can see below:

6. Get your Hands Dirty
Now we have the list of ingredients to build a database. I picked up the MySQL DBMS to store my database. In other words, I chose GCP (Google Cloud Platform) as a MySQL Cloud Hosting. The server is hosted in Cloud, which implies I can access it locally on my computer using a MySQL Database Software like MySQL Workbench. Take a look at the overview of the MySQL server hosted in GCP:

After that, we can certainly code and create the database and ist tables. The following figure shows a piece of my SQL queries

7. Conclusion
The difference between the three phases and the steps of each section can be summarized in the following image:

From a single conceptual data model, we can go through all steps to the logical models. From that logical data model, we can create as many physical models as we want, each of them designed for a particular database engine, as shown below:

In the next article, we'll discuss how to create a data workflow like an ETL process to automatically extract the data from the sources, modify the data if needed, and after submitting it to the dataset that was created.
References
https://www.datacamp.com/blog/data-modeling