From Code Monkeys to Thought Partners: LLMs and the End of Software Engineering Busywork

When it comes to AI and programming, vibe coding is all the rage these days. I’ve tried it, to an extent, and commented about it at length. While it seems a lot of people believe this to be a game changer when it comes to SW development, it seems that among experienced SW engineers, there’s a growing realization that this is not a panacea. In some cases I’ve even seen resentment or scorn at the idea that vibe coding is anything more than a passing hype.

I personally don’t think it’s just a hype. It might be more in the zeitgeist at the moment, but it won’t go away. I believe that, simply because it’s not a new trend. Vibe coding, in my opinion, is nothing more than an evolution of low/no-code platforms. We’ve seen this type of tools since MS-Access and Visual Basic back in the 90s. It definitely has its niche, a viable one, but it’s not something that will eradicate the SW development profession.

I do think that AI will most definitely change how developers work and how programming looks like. But this still will not make programmers obsolete.

This is because the actual challenges are elsewhere.

The Real bottlenecks in Software Engineering

In fact, I think we’re scratching the surface here. Partially because the technology and tooling are still evolving. But also, it’s because it seems most people¹ looking at improving software engineering are looking at the wrong problem.

Anyone who’s been at this business professionally has realized at some point that code production is not the real bottleneck when it comes to being a productive software engineer. It never was the productivity bottleneck.

The real challenges, in real world software development, especially at scale, are different. They revolve mainly around producing a coherent software by a lot of people that need to interact with one another:

Conquering complexity: understanding the business and translating it into working code. Understanding large code bases.
Communication overhead: the amount of coordination that needs to happen between different teams when trying to coordinate design choices². We often end up with knowledge silos.
Maintaining consistency: using the same tools, practices and patterns so operation and evolution will be easier. This is especially true at a large scale of organization, and over time.
Hard to analyze impacts of changes. Tracing back decisions isn’t easy.

A lot of the energy and money invested in doing day-to-day professional software development is about managing this complexity and delivering software at a consistent (increasing?) pace, with acceptable quality. It’s no surprise there’s a whole ecosystem of methodologies, techniques and tools dedicated to alleviate some of these issues. Some are successful, some not so much.

Code generation isn’t really the hard part. That’s probably the easiest part of the story. Having a tool that does it slightly faster³ is great, and it’s helpful, but this doesn’t solve the hard challenges.
We should realize that code generation, however elaborate, is not the entire story. It’s also about understanding the user’s request, constraints and existing code.

The point here isn’t about the fantastic innovations made in the technology. My point is rather that it’s applied to the least interesting problem. As great as the technology and tooling is, and they are great, simply generating code doesn’t solve a big challenge.

This leads me to thinking: is this it?
Is all the promise of AI, when it comes to my line of work, is typing the characters I tell it faster?
Don’t get me wrong, it’s nice to have someone else do the typing⁴, but this seems somewhat underwhelming. It certainly isn’t a game changer.

Intuitively, this doesn’t seem right. But for this we need to go a step back and consider LLMs again.

LLM Strengths Beyond Code Generation

Large Language Models, as the name implies, are pretty good at understanding, well – language. They’re really good at parsing and producing texts, at “understanding” it. I’m avoiding the philosophical debate on the nature of understanding⁵, but I think it’s pretty clear at this point that when it comes to natural language understanding, LLMs provide a very clear advantage.

And this is where it gets interesting. Because when we look at the real world challenges listed above, most of them boil down to communication and understanding of language and semantics.

LLMs are good at:

Natural language understanding – identifying concepts in written text.
Information synthesis – connecting disparate sources.
Pattern recognition
Summarization
Structured data generation

And when you consider mechanizing these capabilities, like LLMs do, you should be able to see the doors this opens.

These capabilities map pretty well to the problems we have in large scale software engineering. Take, for example, pattern recognition. This should help with mastering complexity, especially when complexity is expressed in human language⁶.

Another example might be in addressing communication overhead. It can be greatly reduced when the communication artifacts are generated by agents armed with LLMs. Think about drafting decisions, specifications, summarizing notes and combining them into concrete design artifacts and project plans.
It’s also easier to maintain consistency in design and code, when you have a tireless machine that does the planning and produces the code based on examples and design artifacts it sees in the system.

It should also be easier to understand the impact of changes when you have a machine that traces and connects the decisions to concrete artifacts and components. A machine that checks changes in code isn’t new (you probably know it as “a compiler” or “static code analyzer”). But one that understands high level design documents and connects it eventually to the running code, with no extra metadata, is a novelty. Think about an agent that understands your logs, and your ADRs to find bottlenecks or brainstorm potential improvements.

And this is where it starts to get interesting.

It’s interesting because this is where mechanizing processes starts to pay off – when we address the scale of the process and volume of work. And we do it with little to no loss of quality.

If we can get LLMs to do a lot of the heavy lifting when it comes to identifying correlations, understanding concepts and communicating about it, with other humans and other LLMs, then scaling it is a matter of cost⁷. And if we manage this, we should be on the road to, I believe, an order of magnitude improvement.

So where does that leave us?

Augmenting SW Engineering Teams with LLMs

You have your existing artifacts – your meeting notes, design specifications, code base, language and framework documentation, past design decisions, API descriptors , data schemas, etc.
These are mostly written in English or some other known format.

Imagine a set of LLM-based software agents that connect to these artifacts, understand the concepts and patterns, make the connections and start operating on them. This has an immediate potential to save human time by generating artifacts (not just code), but also make a lot of the communication more consistent. It also has the potential to highlight inconsistencies that would otherwise go unnoticed.

Consider, for example, an ADR assistant that takes in a set of meeting notes, some product requirements document(s) and past decisions, and identifies the new decisions taken automatically, and generates succinct and focused new ADRs based on decisions reached.

Another example would be an agent that can act as a sounding board to design thinking – you throw your ideas at it, allow it to access existing project and system context as well as industry standards and documentation. You then chat with it about where best practices are best applied, and where are the risks in given design alternatives. Design review suddenly becomes more streamlined when you can simply ask the LLM to bring up issues in the proposed design.

Imagine an agent that systematically builds a knowledge graph of your system as it grows. It does it in the background by scanning code committed and connecting it with higher level documentation and requirements (probably after another agent generated them). Understanding the impact of changes becomes easier when you can access such a semantic knowledge graph of your project. Connect it to a git tool and it can also understand code/documentation changes at a very granular level.

All these examples don’t eliminate the human in the loop. It’s actually a common pattern in agentic systems. I don’t think the human(s) can or should be eliminated from the loop. It’s about empowering human engineers to apply intuition and higher level reasoning. Let the machine do the heavy lifting of producing text and scanning it. And in this case we have a machine that can not only scan the text, but understand higher level concepts, to a degree, in it. Humans immediately benefit from this, simply because humans and machines now communicate in the same natural language, at scale.

We can also take it a step further: we don’t necessarily need a complicated or very structured API to allow these agents to communicate amongst themselves. Since LLMs understand text, a simple markdown with some simple structure (headers, blocks) is a pretty good starting point for an LLM to infer concepts. Combine this with diagram-as-code artifacts and you have another win – LLMs understand these structures as well. All with the same artifacts understandable by humans. There’s no need for extra conversions⁸.

So now we can have LLMs communicating with other LLMs, to produce more general automated workflows. Analyzing requirements, in the context of the existing system and past decisions, becomes easier. Identifying inconsistencies or missing/conflicting requirements can be done by connecting a “requirement analyzer” agent to the available knowledge graph produced and updated by another agent. What-if scenarios are easier to explore in design.

Such agents can also help with producing more viable plans for implementation, especially taking into consideration existing code bases. Leaning on (automatically updated) documentation can probably help with LLM context management – making it more accurate at a lower token cost.

Mechanizing Semantics

We should be careful here not to fall into the trap of assuming this is a simple automation, a sort of a more sophisticated robotic process automation , though that has its value as well.

I think it goes beyond that.
A lot of the work we do on a day to day basis is about bringing context and applying it to the problem or task at hand.

When I get a feature design to be reviewed, I read it, and start asking questions. I try to apply system thinking and first principle thinking. I bring in the context of the system and business I’m already aware of. I try to look at the problem from different angles, and ask a series of “what-if” questions on the design proposed. Sometimes it’s surfacing implicit, potentially harmful, assumptions. Sometimes it’s just connecting the dots with another team’s work. Sometimes it’s bringing up the time my system was hacked by a security consultant 15 years ago (true story). There’s a lot of experience that goes into that. But essentially it’s applying the same questions and thought processes to the concepts presented on paper and/or in code.

With LLMs’ ability to derive concepts, identify patterns in them and with vast embedded knowledge, I believe we can encode a lot of that experience into them. Whether it’s by fine tuning, clever prompting or context building. A lot of these thinking steps can be mechanized. It seems we have a machine that can derive semantics from natural language. We have the potential to leverage this mechanization into the day to day of software production. It’s more than simple pattern identification. It’s about bridging the gap between human expression to formal methods (be it diagrams or code). The gap seems to be becoming smaller by the day.

Let’s not forget that software development is usually a team effort. And when we have little automatic helpers that understand our language, and make connections to existing systems, patterns and vocabulary, they’re also helping us to communicate amongst ourselves. In a world where remote work is prevalent, development teams are often geographically distributed and communicating in a language that is not native to anyone in the development team – having something that summarizes your thoughts, verifying meeting notes against existing patterns and ultimately checking if your components behave nicely with the plans of other teams, all in perfect English, is a definite win.

This probably won’t be an easy thing to do, and will have a lot of nuances (e.g. legacy vs. newer code, different styles of architecture, evolving non functional requirements). But for the first time I feel this is a realistic goal, even if it’s not immediately achievable.

Are We Done?

This of course begs the question – where is the line? If we can encode our experience as developers and architects into the machine, are we really on the path to obsolescence?

My feeling is that no, we are not. At the end of the process, after all alternatives are weighed, assumptions are surfaced, trade offs are considered, a decision needs to be taken.

At the level of code writing, this decision – what code to produce – can probably be taken by an LLM. This is a case where constraints are clearer and with correct context and understanding there’s a good chance of getting it right. The expected output is more easily verifiable.

But this isn’t true for more “strategic” design choices. Things that go beyond code organization or localized algorithm performance. Choices that involve human elements like skill sets and relationships, or contractual and business pressure. Ultimately, the decision involves a degree of intuition. I can’t say whether intuition can be built into LLMs, intuitively I believe it can’t (pun intended). I highly doubt we can emulate that using LLMs, at least not in the foreseeable future.

So when all analysis is done, the decision maker is still a human (or a group of humans). A human that needs to consider the analysis, apply his experience, and decide on a course forward. If the LLM-based assistant is good enough, it can present a good summary and even recommendations, all done automatically. This analysis still needs to be understood and used by humans to reach a conclusion.

Are we there yet? No.
Are we close? Closer than ever probably, but still a way to go.

Can we think of a way to get there? Probably yes.

A Possible Roadmap

How can we realize this?

The answer seems to be, as always, to start simple, integrate and iterate; ad infinitum. In this case, however, the technology is still relatively young, and there’s a lot going on. Anything from the foundation models, relevant databases, coding tools, to prompt engineering, MCPs and beyond . These are all being actively researched and developed. So trying to predict how this will evolve is even harder.

Still, if I have to think on how this will evolve, practically, this is how I think it will go, at least one possible path.

Foundational System Understanding

First, we’ll probably start with simple knowledge building. I expect we’ll first see AI agents that can read code, produce and consume design knowledge – how current systems operate. This is already happening and I expect it will improve. It’s here mainly because the task in this case is well known and tools are here. We can verify results and fine tune the techniques.
Examples of this could be AI agents that produce detailed sequence diagrams of existing code, and then identifying components. Other AI agents can consume design documents/notes and meeting transcriptions, together with the already produced description to produce an accurate record of the changed/enhanced design. Having these agents work continuously and consistently across a large system already provides value.

Connecting Static and Dynamic Knowledge

Given that AI agents have an understanding of the system structure, I can see other AI agents working on dynamic knowledge – analyzing logs, traces and other dynamic data to provide insights into how the system and users actually behave and how the system evolves (through source control). This is more than log and metric analysis. It’s overlaying the information available over a larger knowledge graph of the system, connecting business behavior to the implementation of the system, including its evolution (i.e. git commits and Jira tickets).

Can we now examine and deduce information about better UX design?
Can we provide insights into the decomposition of the system?

Enhanced Contextual Assistant and Design Support

At this point we should have everything to actually provide more proactive design support. I can see AI agents we can chat with, and help us reason about our designs. Where we can suggest a design alternative, and ask the agent to assess it, find hidden complexities, with the context of the existing system. Combined with daily deployments and source control, we can probably expect some time estimates and detailed planning.

This is where I see the “design sounding board” agent coming into play. As well as agents preemptively telling me where expected designs might falter.

More importantly, it’s where AI agents start to make the connections to other teams’ work. Telling me where my designs or expected flow will collide with another team’s plans.
Imagine an AI agent that monitors design decisions, of all teams and domains, identifies the flows they refer to, and highlights potential mismatches between teams or suggests extra integration testing, if necessary, all before sprint planning starts. Impact analysis becomes much easier at this point, not because we can query the available data (though we could, and that’s nice as well), but because we have an AI agent looking at the available data, considering the change, and identifying on its own what the impact is.

There’s still a long way to go until this is realized. Implementing this vision requires taking into account data access issues, LLM and technology evolution, integration and costs. All the makings of a useful software project.
I also expect quite a bit can change, and new techniques/technologies might make this more achievable or completely unnecessary.

And who knows, I could also be completely hallucinating. I heard it’s fashionable these days.

Conclusion: The Real Promise of LLMs in Software Engineering

I’ve argued here that while vibe coding and code generation get most of the attention, they aren’t addressing the real bottlenecks in software development. The true potential of Large Language Models lies in their ability to understand and process natural language, connect disparate information sources, and mechanize semantic understanding at scale.

LLMs can transform software engineering by tackling the actual challenges we face daily: conquering complexity, reducing communication overhead, maintaining consistency, and analyzing the impact of changes. By creating AI agents that can understand requirements, generate documentation, connect design decisions to implementation, and serve as design thinking partners, we can achieve meaningful productivity improvements beyond simply typing code faster, as nifty as that is.

What makes this vision useful and practical is that it doesn’t eliminate humans from the loop. Rather, it augments our capabilities by handling the heavy lifting of information processing and connection-making, while leaving the intuitive, strategic decisions to experienced engineers. This partnership between human intuition and machine-powered semantic understanding represents a genuine step forward in how we build software.

Are we there yet? Not quite. But we’re closer than ever before, and the path forward is becoming clearer.

Have you experienced any of these AI-powered workflows in your own development process? Do you see other applications for LLMs that could address the real bottlenecks in software engineering?

At least most who publicly talk about it ↩︎
‘Just set up an api’ is easier said than done – agreeing on the API is the hard work ↩︎
And this is a bit debatable when you consider non-functional requirements ↩︎
I am getting older ↩︎
Also because I don’t feel qualified to argue on it ↩︎
Data mining has been around forever, but mostly works on structure data ↩︎
Admittedly, not a negligible consideration ↩︎
Though from a pure mechanistic point of view, this might not be the most efficient way ↩︎

Schejter

Lior Schejter et al

From Code Monkeys to Thought Partners: LLMs and the End of Software Engineering Busywork

The Real bottlenecks in Software Engineering

LLM Strengths Beyond Code Generation

Augmenting SW Engineering Teams with LLMs

Mechanizing Semantics

Are We Done?

A Possible Roadmap

Foundational System Understanding

Connecting Static and Dynamic Knowledge

Enhanced Contextual Assistant and Design Support

Conclusion: The Real Promise of LLMs in Software Engineering

One thought on “From Code Monkeys to Thought Partners: LLMs and the End of Software Engineering Busywork”

The Real bottlenecks in Software Engineering

LLM Strengths Beyond Code Generation

Augmenting SW Engineering Teams with LLMs

Mechanizing Semantics

Are We Done?

A Possible Roadmap

Foundational System Understanding

Connecting Static and Dynamic Knowledge

Enhanced Contextual Assistant and Design Support

Conclusion: The Real Promise of LLMs in Software Engineering

Share this:

One thought on “From Code Monkeys to Thought Partners: LLMs and the End of Software Engineering Busywork”