What does modern software architecture look like in 2022?
Modern #softwarearchitecture is explicitly as much about time as it is about the state of the system:
- “this lib will become a separate service when the load exceeds that threshold”
- “this CRUD module will be refactored into an Hexagonal Architecture when it grows more complex”
- “we’ll replace this SaaS service by an in-house one when our needs get more sophisticated”
- “we’ll split this Bounded Context into two when the corresponding team has grown beyond 9 members”
Modern #softwarearchitecture is explicitly about time, it’s dynamic. It’s about evolution, options or irreversible decisions.
It’s no more about detailed target architecture diagrams!
Is it true? Yes. It’s exactly the story of the service I’m working on. It will be in a perpetual “in progress” state because I have made many choices with an explicit expiration date.
We removed dozens of machine learning models from the application when we deployed their new version as separate services. However, between the first deployment and the day we had an external ML model for all supported use cases, the code handled some requests by interacting with external systems, while others were handled inside the application. We kept the old implementation as a backup even when we externalized a model.
When the application was in the intermediate state in the middle of model migration, we had two execution paths for everything. We could switch between them at runtime by flipping a switch in the UI. Of course, an implementation with two execution paths requires lots of tests. Fortunately, the strategy pattern is an excellent way to handle such a situation, so we didn’t need to duplicate the code.
It lasted for almost two years. During those two years, we added new features to the application and changed the existing behavior. Every code change had two implicit requirements. The first: it had to work with both ML implementations. The second was harder to test: the code we write cannot make it difficult to remove the obsolete ML models when we finish the migration. We succeeded and when the time came, removing the outdated code took only half a day of work.
Right now, we are migrating the text tokenization code. As a result, every tokenizable text has two vector representations — one from the old tokenizer and one from the new one. The application stores both, and we can switch between them at runtime. One day, we will switch to the new implementation, and sometime later, we will remove all of the code related to the previous version. Until it happens, we must deal with having twice the required data. However, we don’t treat it as a blocker or an excuse. The system architecture is dynamic, and multiple parts change simultaneously. None of the changes can block any other change. In the meantime, we migrated the databases to a new DBMS and continued adding new features.
How to document such decisions?
I still have a detailed target architecture diagram. It has many sticky notes explaining the differences between the target architecture and reality. Also, the target constantly changes. To make it even more complicated, we have to remember the decisions we have made in the past. How do we do it?
In our case, we use the Architecture Decision Records to explain what we have decided, why, and when we must revisit the decision. In case of some decisions, we create a Jira ticket right away to make a change “when X happens.” I think it’s ok. After all, many decisions come with a caveat: “We know it’s not ideal, and we will rewrite it when X happens. Because when X happens, we will have enough money to afford using service Y, which makes the entire problem disappear.”
Implementation cost, time, and team size are always relevant constraints. I prefer to implement the happy path of a use case and deal with edge cases when users complain about them. When it happens, I know they use it! Spending weeks making something perfect and prepared for 100x traffic makes no sense when you don’t know if it’s worth the effort.
In software, the only irreversible decisions are those causing data loss: removing the data, deciding you don’t need to log something, etc. As long as you have the data, you can change every decision. Keep calm and keep the data (unless it’s too expansive or the lawyers tell you can’t store it).
Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!