How to balance software architecture goals with limited resources
- Publish Date
- Eve Ragins
I’m reminded of a survival RPG where you’re stranded somewhere, and you need to either escape the island or build a life-sustaining place to live. You have a couple things provided for you—some food, some water, and maybe a talking travelogue—but you know it’s not enough to last. You’ll need to ration and be strategic about how to use your resources and potentially acquire more resources to succeed. You’ll probably even want to build some tools along the way.
Trying to introduce architectural changes while balancing resources feels roughly similar—except the stakes of getting it wrong could be wasting millions of dollars as opposed to watching a sad montage wherein your character dies.
To clarify a couple terms for project inclusivity, “resources” is used very broadly here. It could be time, money, know-how, infrastructure, etc. It could arguably be stakeholder buy-in, too, but that’s a different blog post. Sometimes “resources” includes “personnel”, but let’s try to break that cycle: People are people, and a lack indicates a lack of time, money, or know-how. This isn’t to diminish the very real challenges which come from being understaffed, though.
“Architecture” here is also used two-fold. Possibly you’re on a green-field project and don’t have architecture yet per se, but more likely you’re on an existing project and need to do some kind of overhaul or expansion. This post will focus on the latter and assume that you already have at least a live MVP.
Do you know where you’re going?
Your current situation isn’t good or you wouldn’t be thinking of architectural changes. This is, of course, relative, and could range anywhere from “mildly annoying” to “our system is actively falling over and we can’t put out fires fast enough”.
Why do you want to change the architecture?
- What specifically about the current architecture isn’t working?
- How is it impacting your team’s morale?
- How is it impacting your team’s ability to adapt to new business requirements or fix bugs? (Obligatory reminder that complicated != better)
- What is the most pressing issue that you need to overcome?
- What are you really trying to accomplish?
Envision the end state.
Really think about this.
- How will your basic use cases work?
- What about the most complicated use case or two?
- What about likely use cases the business has expressed a desire for but you haven’t been able to implement?
Don’t be hand-wavey when planning.
Take the time to break it down, map it out, share it with your team, and accept criticism.
For a non-exhaustive list of things to consider:
- How will the database(s) need to change?
- How will your APIs change? Are you going to set up a new version or aim for backwards compatibility (which maybe you can phase out over time)?
- What about the UI?
- How will the above impact developer productivity? Do you need to plan for training or additional tooling?
- How will you handle production support? Consider both “external” support teams if you have them as well as ensuring that your team will be set up for success when issues come up.
- What will need to happen to your test and deployment pipelines? What are the security implications and how will you address them?
If this sounds like a lot, it’s because it is. If you’re worried this sounds like waterfall but you’re an agile shop, well, you’re not wrong. (More on this later.) The point of this exercise isn’t to build a master plan and stick to it, but instead to develop enough definition that you can confidently approach with agility.
I encourage you to use your best judgement on just how detailed you should get. I’ll also give you this real-world tale:
The setting was at a small IoT start-up, we had an MVP out the door, and we were trying to introduce a new device which would toggle on or off based on nearby other devices. The most popular solution was to have “on the ground” radio communication between the devices. The alternate solution was to have coordination done in the cloud. After getting extremely detailed—down to the task level—on the effort for each, we ended up implementing the cloud approach with distant future plans to construct the “on the ground” version. I don’t think the “on the ground” contingent ever finished tasking things out—the delta was just that much, and it was only through this exercise that it became apparent.
Can you afford the end state?
Since this is an article about resources, it’s worth asking if you’ll be able to afford the end state before even thinking about how you’re going to achieve it.
For example, if the architectural plan calls for cloud resources, do you have the budget? (Maybe going all-in on AWS with an API Gateway that wraps Kafka which puts something on EventBridge which then triggers chains of Lambda calls with Step Functions while using SNS sounds perfect in the hypothetical, but each of those touch points costs money. Scenario slightly exaggerated for effect.)
If it’s using a new technology, will you be able to afford the time it takes your team to learn how to maintain it in production or to pay for consultants in the interim? What if in learning your team makes honest mistakes which cost even more money?
If the answer to “Can you afford the end state?” is “No”, then you might need to go back to the drawing board to redefine what the end state is. Alternatively, and this is probably true anyway, you should start thinking about how to slice the architecture up so that you can deliver the necessary value now with a plan for the rest of it when funding allows. More on this later.
Do you currently have the resources to get there?
Probably not, or you’d just go for it. 🙂
So, where do you have wiggle room?
In the project triangle, there are three points for cost, time, and scope.
Let’s tweak the names just a little to align more closely with what we’re dealing with: paying people (cost), team bandwidth (time), and new releases for bug fixes or features (scope). Since staffing costs typically well-exceed other operating costs, those are the focus here.
If at this point you’re thinking that you have no wiggle room at all—that you’re already operating on a shoestring, your team is working overtime, and there’s no way you can divert from bug and feature work…you’ll just have to be more creative.
In most circumstances, no obvious wiggle room will mean delaying releases and deferring scope. You can also try negotiating for more money. I don’t recommend asking your team to work harder or for longer hours or less pay—that ultimately results in mistakes, burnout, resentment, or all of the above. Not good.
If your largest wiggle room is team bandwidth…
Let some of your team focus on architecture. Add in pairing time so that the people who aren’t focusing on it can contribute, and it will make the learning curve much smaller when the new architecture is complete.
If your largest wiggle room is cash…
Consider bringing on more staff, even in the short-term. Don’t forget that it will take these staff a little bit of time to ramp up and that they will need some of your team’s time.
If your wiggle room is bug fixes and features…
You’ll need to start setting expectations with stakeholders that, in order to be able to deliver what they want efficiently in the future, or at all, you’ll need to slow down and focus on some technical house-keeping.
In all cases, bringing on consultants who have already done this successfully can save you time and money in the long run.
How are you going to get there?
You know where you want to go. And you know that when you get there, it will be rainbows and unicorns. 🌈🦄
And, unfortunately, it’s probably not in your power to just pause time while the plan comes to bear. And it shouldn’t be. Plan for a slow transition where your team keeps the lights on while also working towards a new architectural future. This is where agility comes in.
Go back to your end state, look at all the components and all the changes that it will take to get there.
- What depends on what?
- What can you peel away while still adding value?
- What can you defer since there’s limited near-term value?
- What can you start implementing cleanly? Maybe there are some new columns you can add to a database or maybe you can refactor part of your code to allow for alternate routing.
Most likely there are some parts of your vision that you can start implementing soon and that will provide value soon.
What’s the absolute least that you can do to unblock a key feature request or address a gnarly part of the application?
What’s the smallest thing you can do that doesn’t negatively impact other functionality but still sets you up for the future?
Plan to move in phases balancing delivery and architecture at every step.
- Slow down on the releases, but don’t stop them unless you absolutely have to.
- Provide value.
- Prove to your stakeholders why letting you have some time for house-keeping will help you keep up with them.
- Where at all possible, draw direct lines between features that they want, what’s blocking them technically, and what else they get from the change.
For example, “I know you really want those reports, but we need to transform our data to be hierarchical in order to build them for you. In order to do that, though, we need to first change how our devices are identified to be more robust as hardware changes. Here’s our plan for that, here’s the expected timeline, and here’s where we currently are. Also, by changing how our devices are identified, it will allow for features C, D, and E in the future, which we wouldn’t have been able to do previously.”
It should also be said that by going incrementally, the new code gets rolled in with the old code—and this is a good thing. It gets tested, and as the old code changes the new code will also be updated as opposed to being off in a branch somewhere resulting in multi-day merge sessions. Having some covering tests here, too, can go a very long way toward confidently being able to roll out changes.
Consider avoiding choosing a feature to implement in the “new” way.
Have a new architectural plan? Have a new feature? It might seem like an obvious strategy to branch off and do the feature in an entirely new way. But in practice what I’ve seen happen is that the rest of the application never gets updated. In the worst case, the new feature ends up being the only one done in the new style in production, and then you’re stuck maintaining it separately from everything else. Now you’ve added to technical debt.
That isn’t to say it’s never a good idea, but unless you really want to do a re-write or maintain two different applications, make it as easy as possible for your existing code to slide into new patterns.
My current theory for why this approach goes awry: when starting with a blank page, the delta between the new and the old ends up being just a bit too much to overcome.
Just…be careful if you go this route.
Circling back around, the potential for an architecural plan to go astray is why having a clear not hand-wavy plan of where you want to end up is so important to introducing architectural changes successfully.
- Know where you want to go.
- Is it really a good end-state?
- Do you need to do a spike or two to affirm assumptions?
- Publicize where you want to go in as much audience-appropriate detail as possible.
- Lean into the places where you have wiggle room on the project triage (cost / time / scope)
- Break your target architecture down into small workable chunks.
- What are the dependencies?
- Do you have covering tests? (If not, make some!)
- Tie business features and bug fixes directly to architectural changes and vice-versa.
- Plan to release changes in as small chunks as possible.
- Earn (or re-affirm) the trust of your stakeholders by delivering timely quality.
- Enlist outside support for any of the above (Like us, maybe. 😎)