Databases were designed for behind-the-scenes, and using them directly without proper tooling or user interface is not practical. In the contrast utilizing the human readable format in this case markdown (with front-matter) makes a lot of sense for small amount of data. Of course markdown comes with it’s own trade-offs, specifically in terms of performance and file management. This formats clearly has significant limitations and this makes it unsuitable for enterprise software It might be the best solution for individual users with little amount of data.
Update 2024-04-14: I don’t use this method for storing links anymore because It was hard to maintain the files and I had to have complex way to load files with ids so I wrote another program to managing my links using Postgresql database! (Yeah I know…)
Update 2024-08-02: I’ve been using the
denote.el
package for the past two week or so, and I have to say that what
prot made here is more than just a package. it’s an idea of
organizing the files based on the file-names and I love it.
Background #
I’ve had the opportunity of working with various databases and data storage methods in the past, but for the past month, I’ve been using the markdown format to store different kinds of data and I can say that the markdown is the solution that I’ve been looking for: practical, easy-to-use and future proof.
I’ve used this method to store tasks, list of links (basically as a bookmark manager), list of books and movies, notes and more… And It’s been nothing but an exceptional experience, and it’s very flexible and I took advantage of this flexibility by writing different tools around it to get things done quickly.
Why I Chose Markdown Over Traditional Databases #
Databases are not future-proof due to their reliance on tooling and the server software itself. In contrast, human-readable formats are future-proof and portable as they doesn’t require any special software to using them. You can use your favorite text editor and there is no restriction.
The markdown format on the other hand is versatile and most
importantly simple and easy-to-parse, personally I use a simple
tool that I wrote for
serializing the different formats of data1
into json
plus some other POSIX utilities and
jq
to extract the fields that I need efficiently.
Using human-readable format also allows to use a version control system like git, which has a advantage of allowing us to collaborate and work on a single file by multiple people at the same time without issues.
No Schema’s? #
Dealing with complex data types, and schema for your data and migration is no easy task. Document based databases changed the whole idea of having schema by introducing dynamic data without strong typing or schema.
In case of yaml
, json
or
toml
serialization and deserialization most of the
libraries will ignore fields that are not defined in the expected
data structure, which means you can add arbitrary fields and it
won’t effect the data.
Take advantage of the file-system #
There is nothing wrong with storing data in multiple files, nowadays
computers are powerful and memory is cheap, and by utilizing system
calls such as mmap(2)
, we can load an entire file into
memory for efficient processing. Additionally, operating systems
provide caching and optimization features to further enhance
performance.
Loading files to the memory is possible due to the compact size of Markdown documents, typically under 1MB, and loading multiple files to the memory is fine due to the minimal resource requirements.
why not just use something like json
?
#
That’s a good question, why not just use something like
json
, yaml
or toml
? Well, you
can! The reason that I chose to use markdown because I had tooling
around it to manage documents and I often use the body of the
document for taking small notes.
Since the data itself is basically serialized in these format there no difference between the two. And to be honest using these formats directly is a better since you don’t have to parse the front-matter anymore and you just try to serialize the whole file.
Other Links and Tools #
I’ve wrote a simple tool called
frontmatter
for extracting the content and the front-matter and the content of a
markdown file.
There is also the popular markdown editor tool obsidian, and with the help of the data-view plugin you can easily query your data with a SQL like syntax. I’ve used obsidian in the past, it’s a great tool but It has many features that I don’t ever plan to use so I just stick with a simple text editor and command-line tools to get things done.
Wrapping things up #
I want to emphasize that I’m not trying to convince you to use this method, and honestly, I don’t really care if you do. Instead, I just wanted to exploring new ways of managing data and share my thoughts about it so you can build your system that you’re comfortable with on top of it. At the end of the day what matters most is to get things done and enjoy the process.
-
The “standard” front-matter format is
yaml
, which was first used in thejekyll
static site generator. But nowadays you can serialize data in different formats such asjson
andtoml
(only difference is the front-matter delimiter). ↩︎