Markdown Files as a Database: Simple, Flexible, and Future-Proof

Databases were designed for behind-the-scenes, and using them directly without proper tooling or user interface is not practical. In the contrast utilizing the human readable format in this case markdown (with front-matter) makes a lot of sense for small amount of data. Of course markdown comes with it’s own trade-offs, specifically in terms of performance and file management. This formats clearly has significant limitations and this makes it unsuitable for enterprise software It might be the best solution for individual users with little amount of data.

Update 2024-04-14: I don’t use this method for storing links anymore because It was hard to maintain the files and I had to have complex way to load files with ids so I wrote another program to managing my links using Postgresql database! (Yeah I know…)

Update 2024-08-02: I’ve been using the denote.el package for the past two week or so, and I have to say that what prot made here is more than just a package. it’s an idea of organizing the files based on the file-names and I love it.

Background #

I’ve had the opportunity of working with various databases and data storage methods in the past, but for the past month, I’ve been using the markdown format to store different kinds of data and I can say that the markdown is the solution that I’ve been looking for: practical, easy-to-use and future proof.

I’ve used this method to store tasks, list of links (basically as a bookmark manager), list of books and movies, notes and more… And It’s been nothing but an exceptional experience, and it’s very flexible and I took advantage of this flexibility by writing different tools around it to get things done quickly.

Why I Chose Markdown Over Traditional Databases #

Databases are not future-proof due to their reliance on tooling and the server software itself. In contrast, human-readable formats are future-proof and portable as they doesn’t require any special software to using them. You can use your favorite text editor and there is no restriction.

The markdown format on the other hand is versatile and most importantly simple and easy-to-parse, personally I use a simple tool that I wrote for serializing the different formats of data¹ into json plus some other POSIX utilities and jq to extract the fields that I need efficiently.

Using human-readable format also allows to use a version control system like git, which has a advantage of allowing us to collaborate and work on a single file by multiple people at the same time without issues.

No Schema’s? #

Dealing with complex data types, and schema for your data and migration is no easy task. Document based databases changed the whole idea of having schema by introducing dynamic data without strong typing or schema.

In case of yaml, json or toml serialization and deserialization most of the libraries will ignore fields that are not defined in the expected data structure, which means you can add arbitrary fields and it won’t effect the data.

Take advantage of the file-system #

There is nothing wrong with storing data in multiple files, nowadays computers are powerful and memory is cheap, and by utilizing system calls such as mmap(2), we can load an entire file into memory for efficient processing. Additionally, operating systems provide caching and optimization features to further enhance performance.

Loading files to the memory is possible due to the compact size of Markdown documents, typically under 1MB, and loading multiple files to the memory is fine due to the minimal resource requirements.

why not just use something like `json`? #

That’s a good question, why not just use something like json, yaml or toml? Well, you can! The reason that I chose to use markdown because I had tooling around it to manage documents and I often use the body of the document for taking small notes.

Since the data itself is basically serialized in these format there no difference between the two. And to be honest using these formats directly is a better since you don’t have to parse the front-matter anymore and you just try to serialize the whole file.

Wrapping things up #

I want to emphasize that I’m not trying to convince you to use this method, and honestly, I don’t really care if you do. Instead, I just wanted to exploring new ways of managing data and share my thoughts about it so you can build your system that you’re comfortable with on top of it. At the end of the day what matters most is to get things done and enjoy the process.

The “standard” front-matter format is yaml, which was first used in the jekyll static site generator. But nowadays you can serialize data in different formats such as json and toml (only difference is the front-matter delimiter). ↩︎