CS410
Data Storages

Faculty
Nikolay Golov
CPO of Tengri Data Platform
Course length
Duration
Total hours
Credits
Language
Course type
Fee for single course
Fee for degree students
Skills you’ll learn
Overview
All contemporary software platforms, whether developed by large corporations (Facebook, Google, OpenAI, etc.) or small businesses, rely on the use of databases or data storage. The foundation of this course is the notion that data storage is an answer or a solution to a problem rather than a technology in and of itself.
During this course, we will study what problems modern software can solve with data storage. We will study the whole spectrum of existing data storages, such as classical RDBMS, key-value storages, NOSQL, document storages, column storages, OLAP, vector databases, embedded and serverless databases, and their weak and strong points.
Students will learn to understand how to identify requirements for the data storages in a given software system and how to wisely choose a particular data storage (or multiple storages), taking into consideration both business requirements and the chosen software architecture (monolithic, microservice, etc.). We will study all concepts and mental models needed to understand data storage, wisely choose them, and embed them into software - manually, as a manager, or using an LLM, such as ChatGPT.
Learning highlights
- The course starts with a brief overview of a data-storage task for any software. Data storage tasks in general, with all possible solutions, are like files, in-memory services, or specialised applications (databases).
- We proceed with a list of requirements that proved to be essential for a data storage tool: ACID, transactions, availability of data access languages (SQL, etc.). Afterwards, we will illustrate why given requirements determined the market dominance of classical relational databases (Oracle, MS SQL, PostgreSQL, MySQL, etc.) at the end of the 20th century. Later, we’ll describe why the technological advances of the 21st century gave birth to a set of non-classical databases, such as in-memory storage, document storage, columnar storage, etc.
- The bulk of the remaining course focuses on the tradeoffs to be considered during technology selection and database design. We will discuss a Polyglot Persistence paradigm for combining multiple databases for different facets of an application, combining their strengths and mitigating their weaknesses. We discuss the balance between performance, complexity, and permitted data delay for various databases and architectural approaches, as well as the fundamental limitations of the CAP theorem. We emphasise the difference between OLAP (analytical) and OLTP tasks and modern data warehouse designs (Data Vault, Anchor Modelling, etc.).
- Plenty of hands-on examples and homework are given to demonstrate ideas and compare and contrast various approaches and technologies. The course wraps up with a discussion of modern state-of-the-art databases, like serverless cloud databases and global cloud tools, violating the CAP theorem. We will also discuss how modern LLM tools (ChatGPT) can be used to design data storage applications and how data storage can be used to benefit LLM (vector databases).
- During the course we will make a significant focus on the various open-source tools to avoid relying too heavily on particular vendors.
Course outline
15 classes
Session 1
Introduction. Data Storage in General. CRUD. Relational model. SQLite.
Session 2
Data Modeling. ER Modelling. SQL Queries. ACID: Atomicity, Durability, Isolation, and Consistency.
Session 3
Designing Tables. Normalisation. 1NF, 2NF, 3NF, … 6NF
Session 4
Classical RBMS - PostgreSQL, Oracle, Microsoft SQL, MySQL. Transaction. Levels of Transaction Isolation.
Session 5
Advanced Transaction Isolation Levels. Database Indexes.
Session 6
Analytical SQL - GROUP BY, Window Functions. Views. Reporting, BI tools.
Session 7
Document Storage. MongoDB, JSON Store. Data Lake.
Session 8
Key-value Storage. Sharding. Redis. Caching.
Session 9
Data Bus. Kafka. Event Driven software Architecture.
Session 10
OLAP Databases. Databases for Analytics. Columnar Storage. Snowflake, BigQuery. DuckDB.
Session 11
Combining Databases. Polyglot Persistence.
Session 12
CAP Theorem. Distributed Systems from a Data Storage Point-of-view.
Session 13
Data Warehouse. Data Modelling for Analytics. Data Vault, Inmon, Kimball, Anchor Modeling.
Session 14
Databases of the future. Serverless concept. Headless concept.
Session 15
Final Quiz
Course materials
Media
Prerequisites
Python coding experience.
Basic understanding of algorithms or set theory.
(optional) SQL
Methodology
Classes will consist of lectures and discussions on given topics. Each day, during a class, there will be a practical task with some type of database being discussed, of at least 5 different types (RDBS, embedded, serverless, key-value, document-storage, and OLAP). During the course, students shall work in groups and do four projects, each studying some aspect of data storage.
Grading
Nikolay got his M.S. degree in applied mathematics and cybernetics from Moscow State University, Russia. Afterwards, he had 15 years of experience building data platforms for various startups and enterprises. From 2013 until 2019, he headed the Data Platform of Avito, Craigslist of Russia, which grew to a multi-billion-dollar company from a small startup. In Avito, he was responsible for analytical databases (Vertica, ClickHouse), OLTP engines (PostgreSQL, Redis, MongoDB), and data buses (Kafka) for analytics and microservices. Later he was Head of Data Platform at ManyChat (a California and Barcelona-based SaaS startup), responsible for the implementation and growth of its Data Platform (AWS+Redis+Snowflake+Tableau), which is being used for analytics and AI. Currently Nikolay is a CPO of a startup, creating a new analytical database, Tengri Data Platform.
See full profileApply for this course
Data Storages
by Nikolay Golov
Total hours
45 Hours
Dates
Apr 28 - May 16, 2025
Fee for single course
€2999
Fee for degree students
€1999
How to secure your spot
Complete the form below to kickstart your application
Schedule your Harbour.Space interview
If successful, get ready to join us on campus
FAQ
Will I receive a certificate after completion?
Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.
Do I need a visa?
This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.
Can I get a discount?
Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.



