Writing multi-module, monolithic apps with graph APIs
Last year, I talked about monolithic architecture that enables easy microservice splitting later.
After applying it for a large codebase, I’ve started using another approach: graph APIs.
In a nutshell, I believe it’s fundamental to isolate different parts of an application into different modules, even if they are deployed as a single monolith at the end of the day.
Limiting the required cognitive load to navigate the code base is a must.
Writing scalable and readable code in the first shot is tough. If you have time and effort constraints (as I have for personal projects: I don’t want to spend hundreds of hours developing them), it’s even harder.
Unreadable, difficult to maintain code is a red flag. But if it works, is isolated and no one needs to look at it anymore, it’s less bad. Sure, one day it will stop working or will need to be updated, but until then people will have better context and know what works and what doesn’t. If it’s contained, it may even be plausible to rewrite it from scratch without huge costs.
I needed to change the provider for stock historical value and I only had to look at a single package, because there was no implementation details leakage to other parts of the code. Even if the codebase had 1MM LOC for all sorts of features, I knew that I (or anyone else) would only need to look at, and try to understand, ~50 LOC.
However, using these providers has some downsides:
It’s relatively too verbose
One module of mine only had pure functions, summing up ~20 LOC. However, in order to isolate it, I had to create a
[defrecord](https://clojuredocs.org/clojure.core/defrecord), implement some lifecycle and so on, even though it was stateless.
The interfaces don’t scale well
get-stock-history is fine. But when you start pushing it towards
get-new-stocks-with-high-volatility, things can get out of control.
I wanted to get the historical value of all the investments available at my bank. One module was able to fetch my bank and get the name of the investments. Another one, which queries another API, was able to convert the name to an ID. Finally, a third module was able to get the history given an ID.
Where to put this composition? I wanted to make things transparent (the fact that my bank doesn’t expose the IDs shouldn’t leak to outer modules), so I made the bank module depend on the second one.
That worked. But in another flow the second module relied on the bank one. If we’re not careful enough, circular dependency may happen. This interdependency is difficult to reason about.
Another solution is to have a higher-level module that knows everyone else. A sort of BFF/façade. That’s better, but it still needs to know that, e.g., the bank module doesn’t expose the IDs so and additional query is needed.
What if we could have looser dependencies? What if no one needed to know how to surgically compose different modules to deliver a single response? With Graph APIs we can.
To implement my example, the bank module could register a resolver like the one below (I’m stripping away some boilerplate for readability):
For the second module:
Finally, for the last module:
I now have a decentralized system. The graph parser knows by itself that, for each element in
:bank/investments, it needs to go from
:investment/history and how to resolve it.
As for the namespace organization, each module has the following structure:
logic/contains pure functions
db/are one of many ports in the hexagonal architecture
graph/has the resolvers above
- when the resolvers get complicated, helper functions are extracted to
controllers/, which orchestrate function calls
definition.cljis a file I already had to bootstrap the server, but has now been extended to each module
An example of
:bankhave immutable data that can be propagated to other components
:componentshas some dependency-injection declarations
:entry-pointis the function to be called in case this module is to be used standalone or as the routing hub
Finally, I have a
[defmethod](https://clojuredocs.org/clojure.core/defmethod) that’s able to reduce a config vector into a final, single config. For
:http, for example, it merges all the bookmarks; for
:resolvers it concatenates all vectors; for
:entry-point, it keeps the last one.
In the end, to start the system, I call:
(:entry-point rest-server.definition/config), for instance, is what spawns the server listening on port 80. If I want to use the system as a CLI, there's no need to spawn the server and curl it. I could simply swap the last config with
Splitting into microservices/libraries
If for any reason the need arises, deploying a module as a microservice is trivial from a code perspective.
For the new microservice:
- clone the monolith into a different repository
- remove all undesired modules
- edit the
- expose the resolvers it has (Pathom call this
For the original monolith:
- remove the module folder except
definition.cljin such a way that the monolith is able to merge its
indexwith the one in the new microservice (Pathom allows merging
indexes as well)
Steps for extracting a module to a library would be very similar (in case the module only has pure logic, for example).
Downsides of the new approach
- for my particular case, having the graph find out the edge traversal is perfectly fine, but if I wanted maximum performance, calling the functions directly could be faster and less resource intensive (pending benchmark, though)
- since edges are loose, it’s difficult to
find usagesgiven a function or field. The IDE won't be able to know in what flows a resolver may be used
This has given my code huge scalability at low costs and I’m fine with the downsides I could think of.
At the moment I have no open-source code to show, but if you’re really interested, contact me and we can work something out.
If you’ve liked the ideas highlighted here regarding graph APIs, please check this talk from Pathom’s author.