logo
episode-header-image
Feb 2024
44m 57s

The Internals of MongoDB

Hussein Nasser
About this episode

https://backend.win

https://databases.win


I’m a big believer that database systems share similar core fundamentals at their storage layer and understanding them allows one to compare different DBMS objectively. For example, How documents are stored in MongoDB is no different from how MySQL or PostgreSQL store rows. 

Everything goes to pages of fixed size and those pages are flushed to disk. 


Each database define page size differently based on their workload, for example MongoDB default page size is 32KB, MySQL InnoDB is 16KB and PostgreSQL is 8KB.


The trick is to fetch what you need from disk efficiently with as fewer I/Os as possible, the rest is API.  


In this video I discuss the evolution of MongoDB internal architecture on how documents are stored and retrieved focusing on the index storage representation. I assume the reader is well versed with fundamentals of database engineering such as indexes, B+Trees, data files, WAL etc, you may pick up my database course to learn the skills.

Let us get started.

Up next
Jun 13
kTLS - Kernel level TLS
Fundamentals of Operating Systems Course https://oscourse.winktls is brilliant.TLS encryption/decryption often happens in userland. While TCP lives in the kernel. With ktls, userland can hand the keys to the kernel and the kernel does crypto. When calling write, the kernel encryp ... Show More
22m 55s
May 9
The beauty of the CPU
If you are bored of contemporary topics of AI and need a breather, I invite you to join me to explore a mundane, fundamental and earthy topic.The CPU.A reading of my substack article https://hnasr.substack.com/p/the-beauty-of-the-cpu 
9m 38s
Apr 18
Sequential Scans in Postgres just got faster
This new PostgreSQL 17 feature is game changer. They know can combine IOs when performing sequential scan. Grab my database coursehttps://courses.husseinnasser.com 
27m 36s
Recommended Episodes
Mar 2023
Moving up a level of abstraction with serverless on MongoDB Atlas and AWS
The history of computing has been a story of moving up levels of abstraction: from hard-coding algorithms and directly manipulating memory addresses with assembly languages to using more natural language constructs in high-level general purpose languages to abstracting the hardwa ... Show More
26m 8s
Jun 2023
#420: Database Consistency & Isolation for Python Devs
See the full show notes for this episode on the website at talkpython.fm/420 
56m 2s
Nov 2021
MongoDB: The Database Platform - [Business Breakdowns, EP. 33]
I’m Jesse Pujji and today we’re breaking down MongoDB. The MongoDB story traces back to 2007 when the founding team was running DoubleClick, a large adtech business now owned by Google. They could not find an existing database software with the agility and scalability that the in ... Show More
45m 8s
Mar 2020
Easier Stream Processing On Kafka With ksqlDB
Summary Building applications on top of unbounded event streams is a complex endeavor, requiring careful integration of multiple disparate systems that were engineered in isolation. The ksqlDB project was created to address this state of affairs by building a unified layer on top ... Show More
43m 36s
Sep 2021
S17:E9 - What are some database architectures and their use cases (Kyle Bernhardy)
In this episode, we talk about database architectures and some of their use cases, with Kyle Bernhardy, CTO of HarperDB. Kyle talks about what a database is, different types of databases, and when you might want to use one type of database over another. Show Links DevDiscuss (spo ... Show More
48m 31s
Oct 2022
Going From Transactional To Analytical And Self-managed To Cloud On One Database With MariaDB
Summary The database market has seen unprecedented activity in recent years, with new options addressing a variety of needs being introduced on a nearly constant basis. Despite that, there are a handful of databases that continue to be adopted due to their proven reliability and ... Show More
52m 4s
Aug 2021
#467: [INTRODUCING] Amazon MemoryDB for Redis
Amazon MemoryDB for Redis is the newest fully managed database service from AWS. Today, Nikki is joined by Zach Gardner, Specialist Solutions Architect at AWS, to introduce this new Redis-compatible, durable, in-memory database service. Learn why we built MemoryDB and dive into b ... Show More
29m 36s
Feb 2023
Shorten the distance between production data and insight
Modern networked applications generate a lot of data, and every business wants to make the most of that data. Most of the time, that means moving production data through some transformation process to get it ready for the analytics process. But what if you could have in-app analy ... Show More
20m 27s
Apr 2022
Postgres.js
Rasmus Porsager created Postgres.js –the fastest full-featured PostgreSQL client for Node.js and Deno. Today he joins Jerod for a deep-dive on Postgres, why he created this open source library, and how you can use it to build pg-backed JavaScript applications. Discuss on Changelo ... Show More
50m 6s
May 2023
The ORMazing show
Nick & KBall sit down with the brilliant Stephen Haberman to discuss all things ORMs! 💻🔍 From the advantages and disadvantages of ORMs in general, to delving into the intricacies of his innovative project Joist, which brings a fresh, idiomatic, ActiveRecord-esque approach to Ty ... Show More
1h 12m