logo
episode-header-image
Feb 2024
44m 57s

The Internals of MongoDB

Hussein Nasser
About this episode

https://backend.win

https://databases.win


I’m a big believer that database systems share similar core fundamentals at their storage layer and understanding them allows one to compare different DBMS objectively. For example, How documents are stored in MongoDB is no different from how MySQL or PostgreSQL store rows. 

Everything goes to pages of fixed size and those pages are flushed to disk. 


Each database define page size differently based on their workload, for example MongoDB default page size is 32KB, MySQL InnoDB is 16KB and PostgreSQL is 8KB.


The trick is to fetch what you need from disk efficiently with as fewer I/Os as possible, the rest is API.  


In this video I discuss the evolution of MongoDB internal architecture on how documents are stored and retrieved focusing on the index storage representation. I assume the reader is well versed with fundamentals of database engineering such as indexes, B+Trees, data files, WAL etc, you may pick up my database course to learn the skills.

Let us get started.

Up next
Nov 24
CPU and Kernel Page Faults
<p>Page faults occurs when the process tries to access a memory that isn’t backed by a physical page kernel raises a fault which loads a page. It happens on first access, stack expansion, COW, swap and much more. However it comes with a cost. </p><p><br /></p><p>In this episode o ... Show More
48m 37s
Oct 31
Amazon US-EAST-1 Outage in Details
On October 19 2025 AWS experienced an outage that lasted over a day, 10 days later we finally got the root cause analysis and we know exactly what caused the DNS to fail0:00 Summary 5:30 How did Dynamo lost its DNS?13:41 EC2 Errors 16:16 Network Load Balancer ErrorsRCA here https ... Show More
24m 26s
Oct 17
Graceful shutdown in HTTP
There are cases where the backend may need to close the connection to prevent unexpected situations, prevent bad actors or simply just free up resources. Closing a connection gracefully allows clients and backends to clean up and finish any pending requests. In this episode of th ... Show More
25m 49s
Recommended Episodes
Mar 2023
Moving up a level of abstraction with serverless on MongoDB Atlas and AWS
<p>The history of computing has been a story of moving up levels of abstraction: from hard-coding algorithms and directly manipulating memory addresses with assembly languages to using more natural language constructs in high-level general purpose languages to abstracting the har ... Show More
26m 8s
Jun 2023
#420: Database Consistency & Isolation for Python Devs
See the full show notes for this episode on the website at <a href="https://talkpython.fm/420">talkpython.fm/420</a> 
56m 2s
Nov 2021
MongoDB: The Database Platform - [Business Breakdowns, EP. 33]
I’m Jesse Pujji and today we’re breaking down MongoDB. The MongoDB story traces back to 2007 when the founding team was running DoubleClick, a large adtech business now owned by Google. They could not find an existing database software with the agility and scalability that the in ... Show More
45m 8s
Mar 2020
Easier Stream Processing On Kafka With ksqlDB
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>Building applications on top of unbounded event streams is a complex endeavor, requiring careful integration of multiple disparate systems that were engineered in isolation. The ksqlDB project was created to address this ... Show More
43m 36s
Sep 2021
S17:E9 - What are some database architectures and their use cases (Kyle Bernhardy)
In this episode, we talk about database architectures and some of their use cases, with Kyle Bernhardy, CTO of HarperDB. Kyle talks about what a database is, different types of databases, and when you might want to use one type of database over another. Show Links DevDiscuss (spo ... Show More
48m 31s
Oct 2022
Going From Transactional To Analytical And Self-managed To Cloud On One Database With MariaDB
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>The database market has seen unprecedented activity in recent years, with new options addressing a variety of needs being introduced on a nearly constant basis. Despite that, there are a handful of databases that continu ... Show More
52m 4s
Aug 2021
#467: [INTRODUCING] Amazon MemoryDB for Redis
Amazon MemoryDB for Redis is the newest fully managed database service from AWS. Today, Nikki is joined by Zach Gardner, Specialist Solutions Architect at AWS, to introduce this new Redis-compatible, durable, in-memory database service. Learn why we built MemoryDB and dive into b ... Show More
29m 36s
Feb 2023
Shorten the distance between production data and insight
<p>Modern networked applications generate a lot of data, and every business wants to make the most of that data. Most of the time, that means moving production data through some transformation process to get it ready for the analytics process. But what if you could have in-app an ... Show More
20m 27s
Apr 2022
Postgres.js
Rasmus Porsager created Postgres.js –the fastest full-featured PostgreSQL client for Node.js and Deno. Today he joins Jerod for a deep-dive on Postgres, why he created this open source library, and how you can use it to build pg-backed JavaScript applications. Discuss on Changelo ... Show More
50m 6s
May 2023
The ORMazing show
Nick & KBall sit down with the brilliant Stephen Haberman to discuss all things ORMs! 💻🔍 From the advantages and disadvantages of ORMs in general, to delving into the intricacies of his innovative project Joist, which brings a fresh, idiomatic, ActiveRecord-esque approach to Ty ... Show More
1h 12m