On data-driven programming
Say we’re part of a startup’s founding team and are responsible for designing and building the core of its backend services. At its core, our job is to extract, transform and load values.
In general, the backend services must:
- receive values from the mobile app. e.g., a JSON payload in a REST endpoint
- perform pure logic over values. e.g., apply f(balance, purchase) → new_balance
- persist values in the database
- and so on
Everything else is overhead. Getters and setters, Gang of Four patterns and whatnot. Sure, these abstractions may allow you to scale code or team sizes. But they are overhead, still. They’re neither inherently part of our startup’s business model nor the reason the CEO decided to hire us.
I guess that this philosophy behind values as first-class citizens led to the code as data movement. Or data-driven programming. Think of DSLs and configuration files. If we want to change our app’s behavior, there’s no need to launch IntellijJ and dive into the code: we can fiddle with some JSON-like file using a notepad app. The code parses it and acts accordingly. Profit.
The benefits of such approach are undeniable, but they introduce new problems. The main ones, that I’m gonna cover in this post, are:
- entry barrier
- no type-safety
Let’s take a Kubernetes config as an example:
The entry barrier is the time we need to take to understand this syntax. It’s a new language, in some sense. It’s good, old YAML but there are rules we need to understand prior to reading or writing such pieces of text.
By no type-safety I mean all the goodies we lose that we otherwise would have in an IDE for a strongly typed language. Autocomplete, Intellisense and references to previously declared values.
For Kubernetes I bet there are 3761 tools to make your life easier. But what about the DSLs we create?
To illustrate, let’s suppose our job is to create a DSL/config file for a backend-driven UI framework. i.e., we want to develop a system that allows front-end applications with screens and flows to be built based on backend responses. Spotify does that, by the way.
With this system, we theoretically only need one person to know the inner machinery of this code. But if a new intern wants to build a new screen, how will she know what to write in our DSL, in the first place?
Creating an editor UI
This is in my opinion the best approach. We could have a web editor with drag-and-drop elements, drop-downs and even a live preview window that shows us how the final result would be.
Discoverability is addressed by these UI elements that pop in front of us. Correctness is guaranteed by the UI itself — by preventing us from clicking on stuff that would result into an invalid file.
But this has some cons:
- it’s expensive to write — in terms of time, at least
- it requires a full-stack team
- it doesn’t solve the problem with code review — in case the UI spits a file such as the Kubernetes one, the code review in general is limited to LGTM or 🙈, as in I trust that you used the UI correctly and I trust that the UI is correct.
Making use of our IDEs
What if we wrote a plugin to VSCode, for example, that somehow helps other developers write our DSLs? But wait, we’re already using a popular format such as JSON or YAML. We could instead write a plugin that helps users write JSONs that follows some convention.
But there are millions of other developers out there. I bet someone has already thought about this and took the time to do exactly what we want. I’m lazy and our job in the startup is to persist values in the database and whatnot. I, for one, don’t want to focus efforts into this.
By using a battle-tested solution we get everything for free plus support for all major IDEs.
There are a lot of articles on the Internet about the benefits and capabilities of JSON Schema, so I won’t bother you with details. I’ll simply demonstrate it with a GIF:
The schema used in this demo was extracted from react-jsonschema-form.
I know that having a popup window when we type sounds like a small benefit but, if you stop to think about it, the intern didn’t even have to leave the file buffer! She didn’t need to browse through the code implementation, or read documents — that are likely outdated or incomplete — or contact the people that contributed to the DSL.
There are ways to extend this to YAML as well.
Oh, one suggestion: don’t write your JSON Schema and your code structs/classes by hand. Let one generate the other so that there’s a single source of truth.
Using data as code (as data)
The cool thing about having a DSL inside a JSON file is that in general it is parseable at runtime: we don’t need to deploy a new server instance and stop the previous one. We can
curl -d "@config.json" -XPOST localhost:3000/apply to update the app's behavior.
But if we don’t need such dynamism — if e.g. in order to update this config file we need to open a PR in the same repository as the code, which leads to a new image build — then why not stay inside the code realm?
We’ll need to refactor packages a little bit so that, by convention, a given set of files will be considered as data, in contrast to implementation detail code. It seems like a small price to pay, though.
This is exactly what Anko does in order to build Android UIs, for example.
Chances are that — assuming we wrote this in Kotlin — we have other developers in the company already familiar with Kotlin, so they will feel at home with this syntax. No extra level of indirection introduced.