I’ve argued before that LLMs’ greatest promise in software engineering lies beyond raw code generation. While producing code remains essential, building scalable, cost-effective software involves far more: requirements, architecture, teamwork and feedback loops. The end goal is of course producing useful and correct software, economically. But the process of producing software, especially as the organization scales, is much more than that.
So how do we adopt AI across a growing software organization—efficiently and at scale?
We’ve gone through1 paradigm shifts before – agile, microservices, DevOps are some examples. Is AI different in some more profound way, or another evolutionary step?
I believe this is a slightly different story, when compared to other technologies, at least when it comes to the practice of SW development.
First, this is an area that’s still being actively researched, with advancements in research and technology being announced all the time. New models and papers drop constantly, fueling FOMO and risk of distraction. Teams can quickly feel overwhelmed without a clear adoption path.
Second, it seems that a technology that sits at the intersection of machines and human communication (because of natural language understanding), has the potential to disrupt not only the technical tools we use, but our workflows and working patterns at the same time. AI feels less like another toolchain and more like a collision of Agile and microservices – reshaping not just code, but communication flows themselves. This may be going too far, but I sometimes imagine this is the first time Conway’s law might be challenged.
The AI ecosystem, especially in the software engineering space2 is abundant with tools and technologies. The rate of current development is staggering, and it’s getting hard to keep up with announcements and tools/patterns/techniques being developed and announced and shared.
Randomly handing teams new AI toys can spark short-term wins. But to unlock AI’s transformative power, we need to be more intentional about it. We need a deliberate adoption roadmap.
Our aim: weave LLMs into daily software engineering to maximize impact. But with tools and standards still maturing, a rigid, long-range plan is unrealistic. There are few substantial case studies that show adoption at scale at this point. Similar to early days of the world wide web, some imagination and extrapolation is required3, and naturally some of it will be wrong or will need to be updated in the future to come.
It’s natural to chase faster coding as the low-hanging fruit. Yet AI’s true potential lies in higher-level workflows. Since I believe the potential is much greater, I try to follow a slightly more structured approach to navigating this challenge.
This here is my attempt at trying to think and articulate an approach for adoption of AI for a software development organization. It’s positioned as a (very) high level roadmap for adopting AI in a way that will benefit the organization and will be hopefully viable and efficient at the same time.
This will probably not fit any organization. Specifics of business, architecture, organizational structure and culture will probably require adapting this, even significantly. Still, I believe this can be used as a framework for thinking about this topic, and can serve at the very least as a rough draft for such a roadmap.
I will of course be happy to hear feedback or how others approach this challenge, if at all.
Before diving into details of such a suggested roadmap, I will need to introduce a preliminary concept which I believe to be central to the topic of AI adoption – AI Workspaces.
AI Workspaces
Most AI technology today focuses on transactional tool usage – a user asks something (prompts), and the AI model responds, potentially with some tool invocations. The utility of this flow is limited, mainly because crafting the prompt and providing the context is hard. Some AI tools provide facilities and behind-the-scenes code that injects further context, but this is still localized, and not always consistent. From the user’s point of view it’s still very transactional.
In order to realize more of the potential AI has for simplification and automation, we need to consistently apply and provide context that is updated and used whenever needed. We need to allow a combination of AI tools with the relevant up-to-date context so more complicated tasks can be achieved. Also, with more AI autonomy, the easier it will be for users to apply and use it successfully.
I’m proposing that we need to start thinking about an “AI workspace”.
An AI workspace is a combination of:
- Basic AI tools, e.g. models used, MCP servers, with their configuration.
- Custom prompts, usually focused on a task or set of tasks in some domain.
- Persistent memory – a contextual knowledge source, potentially growing with every interaction, that is relevant to tasks the AI is meant to address.
The combination of these, using different tools and techniques, should provide a complete framework for AI agents to accomplish ever more complex tasks. The exact capabilities depend of course on the setup, but the main point is that all of these elements, in tandem, are necessary to create more elaborate automation.
A key point here is the knowledge building – the persistent memory. I expect that an AI workspace is something that’s constantly updated (automatically or by the user) so the AI can automatically adapt to changing circumstances, including other AI-based tasks. There should be a compounding effect of knowledge building over time and being used by AI to perform better and more accurately.
An AI workspace should be customized for a specific task or set of tasks. But it can be more useful if it will be customized for a complete business flow that brings together disparate systems and roles in the organization. This will arguably make the workspace more complex and harder to set up, but if used consistently over time, the overhead might be worth it.
We’re already seeing first signs of this (e.g. Claude Projects), but I expect this to go beyond the confines of a single vendor platform, potentially involving several different models, and be open to updates/reading from agents4.
A Roadmap – General Framing
As I’ve already noted, using AI, in my opinion, is more than simply automating some tasks. Automating is great, and provides value, but the potential here is much greater. In order to realize the greater potential we need to leverage the strengths of LLMs, and point them at the right challenges we face in our day to day work in software development.
And these strengths generally boil down to:
- Understanding natural language (and other, more formal, languages)
- Being able to respond and produce content in natural language (and other, more formal, languages)
- Understand patterns in its input and reasoning on it; and apply patterns to its output.
And do all of this at scale.
Looking at the challenges of software development, our general bottlenecks are less in code production, and more in understanding, communicating and applying our understanding effectively. This includes understanding existing code, troubleshooting bug reports, understanding requirements, understanding system architecture, anticipating impact, translating requirements to plans etc.
Apart from actual problems we might face in all of these, there’s also a challenge of scale here. The more people are involved in the software production (larger organization), the larger the codebase and the more clients we have – the greater the challenge.
An immediate corollary of the way (non-trivial) software is built is that it’s not just a problem of software developers. There are more people involved in the software building, evolution and maintenance – devops engineers, product managers, designers, customer support etc. A lot of the challenges are affected by different roles and communication patterns and motivations presented by different roles.
So when it comes to adopting a technology that has the potential to encompass different workflows and roles, I’m looking at adoption from different angles.
Since this is a roadmap, there’s naturally a general time component to it. But I’m also looking at it using a different axis – the way different roles or workflows (tasks?) adopt AI, and at what point these workflows converge, and how exactly.
The general framing of the roadmap is therefore a progression across phases of different verticals of “types of work” or roles if you will.
Workflow Verticals
When building software5 we have different tasks, performed by separate cooperating professionals. I’d like to avoid the discussion on software project management methodologies, so suffice to say that different people cooperate to produce, evolve and maintain the software system , each with more or less well defined tasks6.
Roughly speaking these workflows are:
- Design and coding of the software: anything from infrastructure to application design, prototyping, implementation and debugging.
- Testing and quality: measuring and improving quality processes – generating tests, measuring coverage, simulating product flows, assessing usage.
- Incident management: identifying and troubleshooting issues (bugs or otherwise), at scale. This includes also customer facing support.
- Product and Project management: analyzing market trends and requirements, guiding the product roadmap, rolling out changes, synchronizing implementations across teams
- Operations and monitoring: monitoring the system behavior, applying updates, identifying issues proactively, etc.
All of these tasks are part of what makes the software, and operates it on a daily basis. There’s obviously some overlap, but more importantly there are synergies between these roles. People fulfilling these roles constantly cooperate to do their job.
People doing these roles also have their own tools and processes, each in its own domain, with the potential to be greatly enhanced by AI. We’re already seeing a plethora of tools promising, with varying7 degrees of success, to optimize and improve productivity in all of these areas.
Just to name a few examples to this:
- Software coding is obviously being disrupted by AI-driven IDEs and agents.
- Product management can leverage AI for analyzing market feedback, producing and checking requirements, simulating “what-if” scenarios, researching, etc.
- Incident management can easily benefit from AI analyzing logs, traces and reports, helping to provide troubleshooting teams with relevant context and analysis of issues.
- Testing can be generated and maintained automatically alongside changing code.
- UX design can go from drawing to prototype in no time.
And I’m sure there are more examples I’m not even aware of. The list goes on.
The point here is not to exhaustively list all the potential benefits of AI. Rather, I argue that for the software organization to effectively leverage AI, it needs to do it across these “verticals”.
And as the organization and the technologies mature, we have better potential to leverage cooperation and synergies between these verticals.
This won’t happen immediately. It probably won’t happen for a while, if at all. But for that, we need to talk about phases of adoption.
Phases of Adoption
I try to outline here several phases for the adoption of AI. These phases are not necessarily clearly distinct. Progress across these is probably not linear nor constant. The point of this description is not so much to provide a concrete timeline. This is more about describing the main driving forces and potential value we can gain at each phase. Understanding this should help us plan and articulate better more concrete steps for realizing the vision.
You can look at these phases as a sort of “AI Maturity Level”, although I’m not trying to provide any kind of formal or rigorous definition to this. It’s more of a mindset.
Phase 1: Exploration and Basic Usage
At this phase, different teams explore the possibilities and tools available for AI usage. The current rate of innovation in this field, especially around software development is extremely high. Given this, I expect employees in different roles will experiment and try various tools and techniques, trying to optimize their existing workflows in one way or another.
At this point, the organization drives for quick wins, where people in different roles leverage AI tools for common tasks, share knowledge internally and learn from the community.
Covered scenarios at this point are localized to specific workflows and focus mainly on providing context to localized (1-2 people) tasks, as well as automation or faster completion of such localized tasks.
LLM and AI usage at this point is very much triggered and controlled by humans requesting and reviewing results. This work is very much task/workflow oriented at this point, with AI tools serving specific focused tasks. The human-AI interaction at this point is very transactional and limited in scope.
The organization should expect to gain the required fundamental knowledge of deploying and using the different tools securely and in a scalable manner, including performance, cost operations etc. At this phase, a lot of experimentation and evaluation is happening. It will be good to establish an internal community driving the tooling and adoption of AI. The organization should expect several quick wins and localized productivity gains.
I expect the learning curve to be steep in this phase, so a lot of what happens here is trial and error and comparison of different tools, techniques and models.
AI workspaces at this point, if they exist, are very much focused on the localized context of individual well-defined tasks. They are also probably harder to establish and operate (integrate tools, add information).
What would be the expected value?
Phase 1 focuses on achieving quick wins and localized productivity gains. By implementing AI code assistants, automated code reviews, AI-generated tests, and anomaly detection tools, the organization can quickly demonstrate immediate developer speedups, improved code quality, faster test coverage, and early incident learning.
This goes beyond a business benefit. It’s also a psychological hurdle to overcome. Concrete wins, such as fewer bugs and faster releases, build momentum and justify further investment in AI adoption while increasing developer satisfaction.
In addition, there’s going to be considerable technical infrastructure investment done at this point, e.g. model governance, cost management, etc. This infrastructure should be leveraged in the following phases as well, and is therefore critical. This phase provides a strong foundation for leveraging AI in future stages.
Phase 2: Grounding in Domain-Specific Knowledge
At this phase, having gained basic proficiency, the organization should expect to improve performance and scope of AI-enabled tasks by starting to build and expose organization-specific knowledge and processes to LLM models.
I expect that business-specific information (internal or external) can increase performance and open up possibilities to more tasks that can be improved using AI. Examples to knowledge building include better code and design understanding, understanding of relationship between different deployed components, connecting product requirements to code and technical artifacts, etc.
This can open the road to higher level AI-driven tasks, like analyzing and understanding the impact of different features, simulating choices, detecting inconsistencies in product and technical architecture and more.
A key aspect of this phase is to facilitate a consistent evolution of the knowledge so it can be scaled and maintain its efficacy. At this point, the organization needs to have the infrastructure and efficient standards in place so information can be shared between roles, and between different AI-driven tools and processes.
In this phase AI workspaces become more robust and prevalent, encompassing a larger context, and even crossing across workflows verticals in some cases. Contrast this with workspaces we’ve seen in the first phase which are more focused in localized contexts.
This phase is also when we start thinking in “AI Systems” instead of simply using AI tools. This is where we consistently apply and use AI workspaces, with several tools (AI or non-AI) being combined with the same knowledge base, and evolve it together.
An example for this would be AI coding agents that automatically connect their implementation to JIRA tickets, product requirements, and record this knowledge. With other AI agents leveraging this knowledge to map it to design decisions and testing coverage reports (how much of the product requirements are tested) and plan roll outs.
What value can we expect to have at this point?
Phase 2 is mainly around integrating company-specific (and company-wide) knowledge with AI workspaces. At this point I expect existing workloads to be more accurate, precise and faster in doing their work, even if the task is limited in scope. The grounding provided by the specific knowledge graph should improve the accuracy of AI models.
Different workflow verticals will start to cooperate more closely at this point. First of all, by building a knowledge graph/base together. But also by leveraging this combined knowledge to implement simple agentic workflows, where AI-based agents start to reason on the data and make simple decisions.
Phase 3: Autonomous Cross Team Workflows
This is the point where previous infrastructure starts to really pay off in terms of increased productivity and quality.
At this phase of adoption, I expect we’ll see more autonomous AI-driven processes coming into fruition. And when I say “AI-driven” I’m not referring to simply automating a well known process. I’m referring to AI agents reasoning and dynamically using tools and other agents to adapt and produce results/do tasks8. I expect at this point AI agents can also build their own knowledge, and adapt their work to accommodate changes in the environment.
Humans are still in the loop for critical decision making, but the friction between humans and tools, and humans to humans is significantly reduced9. The focus at this point should be on eliminating bureaucracy and increasing the adoption of consistent and increasingly robust workflows. This generalization also means that agentic AI systems now work across roles and departments, it’s where the workflow verticals start to converge.
Examples to this would be:
- Managing changes across roles and workflows. For example, a change in UX/product feature definition that is automatically reflected in plans, and rolled out to clients.
- Technical design that is validated against technical dependencies (from other teams), past decisions and project plans. Potentially updating the dependencies and informing other agent, potentially changing agent decisions as a result.
- Identifying cross-cutting issues from internal conversations, correlated with support tickets and other metrics, and proactively planning and suggesting resolutions.
At this phase, I expect AI workspaces to become really cross-departmental and leverage knowledge being built and added in different verticals.
Ad-hoc exploration and automation of tools should also be possible. At this point, the organization should have a strong foundation of tooling and experience with applying AI. It should be possible to allow ad-hoc building of new flows on top of the existing LLM infrastructure and the ever-evolving organizational knowledge base.
Note that this also poses a challenge: there is a fine line between standardization of tools, which drives efficiencies at scale, and democratization of capabilities. You want people to experiment and find new ways to optimize their work, but in order to efficiently grow you’ll need to apply some boundaries to what is used and how it’s used. This tradeoff isn’t unique to AI systems, but I believe it will become more emphasized when we consider new directions and applications of LLMs as the technology improves.
In terms of expected value, we should expect significant productivity gains. While humans are still in the loop, AI will further automate processes, reducing bureaucracy. The focus will be on adoption of consistent productive workflows across roles and departments. Human focus should be on innovation and decision making at this point, with accurate and reliable information being made available to humans, by the machines10.
Technical Infrastructure
In order to support this process, looking at the expected phases of adoption, we should pay attention and plan the necessary technical infrastructure investment. This is true with the adoption of any new technology, but with the current explosion of tools and techniques, it’s very easy to lose focus.
I won’t pretend to know exactly which tools should be available at what point. Nor do I expect to know a definitive list of tools and compare them at this point11. But in order to plan ahead investments, and make a concerted effort on learning what will help us, I believe we can give some idea of what will be needed at each phase of adoption.
In phase 1, we naturally explore a plethora of tools. We should be able to facilitate new models for different use cases. Enabling access to different models using tools that provide a (more or less) uniform facade is useful. Examples for this are OpenWebUI, LiteLLM. We should provide access to AI-driven IDEs, like Cursor, Windsurf and similar ones.
For non-development workflows, AI-based prototyping tools should be helpful, and vendor-specific AI extensions would be helpful. The same goes for monitoring tools.
Connecting these tools with MCP servers to existing hosts of MCP clients (IDEs, chat applications, etc) would probably be useful as well. So support for installing and monitoring MCPs might be useful. At this point it should be also useful to establish some way to measure effectiveness of prompts or model tuning, and track usage of various tools.
In phase 2, building and potentially maturing the infrastructure at phase 1, we should start focusing on more robust workflows, and knowledge building. Depending on use cases, it could be useful to look at agent workflow frameworks (LangChain, et al) and agent authoring tools (e.g. n8n).
Additionally, knowledge management tools and processes will probably be useful to introduce – easily configured RAG processes (and therefore vector DBs), memory management techniques, maybe graph databases. This of course all depends on the techniques used for memory building and maintenance.
I expect MCP servers, especially ones specialized for the organization’s code and other knowledge systems, will become more central. It should be possible to also create necessary MCP servers that will allow LLMs to access and use internal tools.
In phase 3, I expect most of the technical features to be in place. This will be a phase where the focus will be more optimizing costs and improving performance. It’s possible that we should be looking at ways to use more efficient models, and match models to tasks, potentially fine tuning models, in whatever method.
Monitoring the operation and costs of agents, understanding what happens in different flows will become more critical at this point, especially when usage scales up in the organization, and AI adoption increases, across departments.
Summary
AI stands to transform software engineering far beyond code generation. Realizing that promise demands coordinated learning, infrastructure and a phased roadmap. This framework offers a starting point
I believe that due to the nature of the technology, it goes beyond simple tool adoption, or alternatively adopting a new project management practice. This has the potential to change both aspects of work.
The structure I’m proposing is to highlight the potential in each “stream” of workflow vertical, and adopt the tools in phases of maturity, as the ecosystem evolves (click to view full size):
This visualization is only an illustration of course. You’ll note it’s laid out as a “layer cake” where scenarios for using AI are roughly laid out on top of other use cases/scenarios which should probably precede them.
This is of course not an exhaustive list.
The attempt here is of course to structure the process into something that can be further refined and hopefully result in an actionable plan. At the very least, it should serve as a guideline on where to focus research, learning and implementation efforts, to bring value.
It would be nice to know what other people are thinking when trying to structure such a process; or what the AI thinks about this.
On to explore more.
- Dare I say “weathered”? ↩︎
- SW engineers being natural early adopters for this technology ↩︎
- And we know how some attempts didn’t end well. ↩︎
- To be honest, I did not yet dive into the Claude projects, so it’s possible they support this. But I can imagine something similar done with other tools as well. ↩︎
- And probably in other industries as well, but I know software best. ↩︎
- I realize this is kind of hand-wavy, but bear with me. Also, you probably know what I’m talking about ↩︎
- Ever increasing? ↩︎
- In a sense, leveraging test time compute at the agentic system level ↩︎
- Although in some cases, friction is desirable – think of compliance, cost management, etc. ↩︎
- I guess accurate context is also important for humans, who would’ve guessed. ↩︎
- And let’s face it, at the rate things are going right now, by the time I finish writing this, there will be new tools ↩︎