On microservice splitting and code refactoring
Let’s say that you work for a company that offers a platform for personal finance management. You were requested to add a feature whose input from the customer is a list of company × amount of stocks × investment date and the output is the customer’s total balance for a given point in time.
How big or small should the microservice(s) be?
At one end of the spectrum, you have a single monolith, with scalability issues.
At the other end, you have one service for:
- stock value scraping (
- company info store (
- stock value history store (
- customer input stock store (
This approach gives us a nice separation of concerns and makes it easy to develop and scale each part independently. However, it takes more time to have an MVP this way, and time to market may be crucial. And it costs more, as well, because of overheads (such as multiple JVMs, databases, message streaming, etc).
Given your resource constraints, maybe it makes sense to start as a single service (
newFeature). But what if you want or need to split it later?
Here are my tips for designing a service easy to refactor, at a low cost during development.
0. Divide your application into layers
1. Don’t be afraid of using many namespaces
Don’t keep all namespaces (packages, files) centralized in a root one. Instead of
stockScraper.logic. Have you created a pure function that may be useful for both
stockStore? Put it inside
stock.logic. Have you written a function for linear interpolation? This math concept isn't restricted to a specific feature so it doesn't have to be inside
newFeature.logic.math, for example. Put it inside
common.math. This way you aren't tempted to put
stockSum in the same file.
When the time comes for service splitting, you can deploy the
stock folders as libraries and start the
stockScraper service by cut and pasting the respective folder. Much easier than traversing all the codebase later to extract everything you need!
2. Import non-common namespaces with care
stockHistory have to import
stockScraper a lot? Can't it be minimized? Remember that when splitting this will translate into HTTP calls or message streaming. Try to at least isolate this dependency into a single namespace, which will be the precursor of the API between the two. This step is the most subjective, because it can quickly introduce overheads in the codebase for an MVP. You must consider the trade-offs.
3. Isolate implementation details into higher level components
Let’s now implement the handler for the endpoint that returns information about all the companies a customer has stocks from. One common approach is the following (we’re exposing components via dependency injection and middlewares):
After splitting our service, not all data will be available via the
db argument, of course. We'll need to make some HTTP calls.
customerCompaniesHandler will have to get an HTTP component and pass it to
getCustomerCompanies, which will propagate it to the domain-specific helper functions and so on...
But why does the handler have to know this in the first place? We have divided our application into layers and, more importantly, we have already separated our code into
stockStore and other namespaces. Yet, we clearly have an implementation detail leak such that all our higher level, integration API code has to know about low level dependency management. These layers should be agnostic to where we store the
Ideally, our handler should only know that it has to fetch data that depends on the
company entities. What if we created a component that abstracts this away, per entity? Let's call one of them as
CompanyRepository (you can come up with the name you like):
You can skip the interface/protocol definition if you’re not into that. The important thing is to avoid leaking lower level components.
After adding this component to our dependency graph, our code would become:
For the service split, we could create:
The only thing we need to do is update the dependency graph! All the rest of the code remains the same.
As a plus, we can move
companyStore. Code reuse!
Suppose you’re absolutely certain that you won’t have to split your recently created microservice or that the process will be easy enough such that you won’t curse your past self. Then, you may not see much value in my article.
However, this isn’t just about spatial organization or isolation. It’s about making domains arguably easier to reason about. Establishing higher level boundaries reduces the cognitive requirement to fiddle with a codebase. You only need to traverse the code as deep as required. I find it easier to handle new abstractions than reaching a mental stack overflow when browsing lines of code.
And even if you decide to keep a single service, it’s much easier for a newcomer to improve
stockParser, for example, if he or she only has to read a root namespace and not the whole microservice code.
Anyway, no one is able to perfectly measure the ideal microservice size, because that varies in time, the company size, the feature success and many other factors. So it’s nice to be flexible.
- Duct follows this pattern and calls it boundary.
- You can skip the component creation altogether and define boundaries with resolvers using a library such as Pathom: you define a graph edge that enables you to go from a
company, for example.
Even though I focused on the back-end, this applies to front-end as well. Why make the
CompanyDetailsPage receive an HTTP component and perform a request? It can simply receive a
CompanyRepository and you can call