Tag Archives: ai

Exploring Vibe Coding with AI: My Experiment

In my previous post I mentioned vibe coding as a current trend of coding with AI. But I haven’t actually tried it.

So I’ve decided to jump on the bandwagon and give it a try. Granted, I’m not the obvious target audience for this technique, but before passing judgment I had to see/feel it for myself.

It’s not the first time I generated code using an LLM with some prompting. But this time I was more committed to try out “the vibe”. To be clear, I did not intend to go all in with voice commands, transcription, and watching Netflix while the LLM worked. I did intend to review the code, and keep in touch with the output at every point. I wanted to test the tool’s capabilities while still being very much aware of what was going on.

Below is an account of what happened, my thoughts and conclusions so far.
A general disclaimer is of course in place: I’m still exploring these tools, and it’s quite possible there could be improvements to the process. My experience, however, is very much influenced by my experience as a developer. My choice of tools and how to use them is therefore very much biased towards usage as an experienced developer looking to increase productivity, not a non-coder looking to crank out one-off applications1.

The Setup

I set out to create a new simple tool for myself (actually to be used at work), something I actually find useful, and is not an obvious side project that’s been done a million times, and therefore less likely (I hope) to be in the LLM’s training data. It’s a project done from scratch, and I’m trying to do something that I don’t have a lot of experience with. It is also meant to be fairly limited in scope.

The project itself is a “Knowledge Graph Visualizer”, essentially an in-browser viewer of a graph representing arbitrary concepts and their relationships. I intended this to be purely browser, JS code. The main feature is a 3D rendering of the graph, allowing navigation through the concepts and their links. You can see the initial bare specification here.

To get a feel for the project, here’s a current screenshot:

KG-Viewer showing its own knowledge graph

With respect to tooling I went with Cursor (I use Cursor Pro), using primarily Claude-Sonnet 3.7 model. The initial code generation was actually done with Gemini 2.5 pro. But I quickly ran out of credits there. So the bulk of the work was done with Cursor.

I did not use any special cursor rules or MCP tools. This may have altered the experience to a degree (though I doubt it), so I will need to continue trying it as I explore these tools and techniques.

Getting Into the Vibe

It actually started fairly impressive. Given the initial spec, Gemini generated 6 files that provided the skeleton for a proof of concept. All of these files are still there. I did not look too deeply into the generated code. Instead, I initialized an empty simple project, launched Cursor, and copied the files there. With a few tweaks2, it worked. I had a working POC in about one hour of work. Without ever coding 3D renderings of graphs.

Magic!
I’ll be honest – I was impressed at first. I got a working implementation for drawing a graph with Three.js, for some JSON schema describing a graph. Given that I never laid eyes on Three.js, this was definitely faster than I would have gotten even to this simple POC.

I did peek at the code. I wasn’t overly impressed by it – there was a lot of unnecessary repetition, very long functions, and some weird design choices. For example, having a style.css holding all the style classes, but at the same time generating a new style and dynamically injecting it into the document.
But, adhering to my “viber code”, I did not touch the code, instead working only with prompts.

Then I started asking for more features.

Cursor/Claude, We Have a Problem

A POC is nice. But I actually need a working tool. So I started asking for more features.
Note, I did not just continue to spill out requests in the chat. I followed the common wisdom – using a new chat instance, laying out the feature specification and working step by step on planning and testing before implementation.

I wrote a simple file, which should allow me to trace the feature’s spec and implementation.
The general structure is simple:

- Feature Specification
- Plan
- Testing
- Implementation Log

Where I fill in only the Feature Specification, and let Cursor fill in the plan (after approval) and the “Implementation Log” as we proceed.

The plan was to have a working log of progress, to be used as both a log of the work, but also provide context to future chat sessions.

I don’t intend to re-create here the entire chat session or all my prompts, as this is not intended to be a tutorial on LLM techniques. But fair to say that the first feature (data retrieval), was implemented fairly easily, using only prompts.

Just One Small Change…

I was actually still pretty impressed at this point, so I simply asked for tiny small feature – showing node and link labels. I did it without creating an explicit “feature file”.

The code didn’t work. So I asked Cursor to fix it. And this quickly spiraled out of control. Cursor’s agent of course notified me on every request that it had definitely figured out the issue, and now it has the fix (!).
It didn’t.

I remained loyal to the “vibe coder creed”, and did not try to debug/fix the code myself. Instead deliberately going in cycles of prompting for fixes, accepting changes blindly, testing, and prompting again with new errors.

Somewhere along this cycle, the code changes made by the agent actually created regression in the application’s code resulting in the application not loading at all.

After roughly 3 hours, and a lot more grey hair, I did notice that the Cursor agent was going in circles – simply trying out the same 3 solutions, with no idea what’s wrong. But still confidently hallucinating solutions (“Now I see the issue…”3).

This was so frustrating that at this point I simply took it upon myself to actually look at the code, which was a complete mess. I looked at the problematic code, consulted git diffs to restore basic functionality, and solved the actual issue with about 10 more minutes of Google search.

To be fair, from my very rudimentary google search it seemed my request (link labels) wasn’t that easy to achieve. It’s apparently not that obvious (again, without being an expert on Three.js). I relaxed the requirement a bit, and found a simple solution.
Still, the whole cycle of back and forth of code changes, especially to unrelated code, was very much counter-productive. The vibes were all wrong. Getting back to working code took another 2-3 hours.

At this point I was thinking “oh well, you can’t win them all”. I wanted to turn to something simple. And looking at the state of the code, a simple cleanup should be easy enough, right?

Right? …

Now It’s Just Cleanup

Well … it depends.

I went back into “vibe coding” mode. This time, I defined very basic code cleanup procedures. I then asked Cursor’s agent (in a new session), to go through the source code and follow these steps to clean it up.

It actually did reasonably well for small files. The bigger files proved to be more challenging. Trying to clean them up ended up messing the files completely. For some reason, the LLM agent removed functioning code, and created functionality regressions. Trying to quickly fix them ended up in causing more issues. It was clearly guessing at this point.

Given my battle scars with the previous feature request, I avoided this hallucination death spiral. Instead, I went through git history, found a working version, and restored the working code “by hand” – actully typing in code. I wasn’t a vibe coder anymore, but the application worked, the code was cleaner, and my blood pressure remained fairly low (I think).

The experience felt like trying to mentor a junior developer to code without creating regressions. The problem is it’s a fast and confident junior developer, with short term memory loss, who is apparently so eager to please that he simply spews out code that looks remotely connected to the problem at hand, with little understanding of context; proving to be ignorant even of changes it itself made to the code.

Documentation for Man and Machine

At this point I decided to go back to basics, where LLMs truly shine – understanding and creating text. I asked for it to create documentation for specific flows in the code (init sequence, clicking on a legend item). Unsurprisingly, with a few simple prompts, the agent produced decent documentation for what I asked, including a mermaid.js diagram code.

This is important not simply because it allowed me to document project easily, which is nice. Creating a textual documentation of specific flow also allowed me to provide better context for other chat sessions. And this is an important insight – textual descriptions of the code are useful for humans as well as the LLMs.

Other Features

At this point I turned to develop more features – loading data and “node focus“. In both cases I went back to providing feature files, with specifications, and asking the agent to update the files with plans and implementation logs.

I was a bit more cautious now. I reviewed code more carefully and intervened where I felt the code wasn’t good. In some cases it was obvious the code wasn’t functionally correct, but instead of trying to “fight” with the agent, I accepted the code and went on to change it myself.

A repeating phrase in all my prompts at this point was:

Do minimal code changes. Change only what is needed and nothing more.

This, combined with being more cautious and careful, resulted in pretty good results. I managed to implement two features in a short time. Probably a bit shorter compared to what it would have taken me to run through Three.js tutorials and do it myself.

Final Thoughts

So where does this leave me?

I have a working application. And if I had to learn Three.js from scratch myself, it would have taken me considerably longer to create. It’s working, and it’s useful. This is an important bottom line.

Small Application, Good Starting Point

The initial code, generated by the LLM (Gemini or Claude) does serve as a good starting point, especially in areas or frameworks that unfamiliar to the developer.

But this is still a far cry from replacing developers. There are tool limitations, some of them, I expect, introduced by Cursor rather than the LLM. These limitations can cause havoc if the agent is left to proceed with no oversight.
And review is harder when there’s a ton of unorganized code4.

We can probably make it better with rules, better prompts, and combination of agents. And of course advances in LLM training.

This is a good starting point. But we need to remember this is a very small application, made from scratch. In the real world, a lot of use cases are not that simple at all. The more I read and think about it, this bears a striking resemblance to no-code/low-code tools. Also in those cases, it’s easy to achieve quick results for simple uses cases, but very hard to scale development when features creep in or the application needs to scale.

It’s not that low-code tools don’t have their place. They serve a very specific (viable) niche. But as experience shows, they haven’t replaced developers.

Could this be different?
What would it take to tackle more serious challenges, with “vibe coding”?

Context is King

It’s quite obvious that in the kingdom of tokens, amidst ramparts of code and wind all of chat messages, there is only one king, and its name is Context5. As LLMs are limited in their context size, and a lot of it is taken up by wrapping tools (Cursor in this case), context for an LLM chat is an expensive real estate.

So while context windows can get big, we’ll probably never have enough when we get to more complicated tasks and bigger code bases. There’s a preservation of complexity at play.

Accuracy and precision in the context play a crucial role in effectiveness. Context passed to LLMs need to be information-dense. We should probably start considering how efficient is the context we’re providing to LLMs. I don’t know how to measure context efficiency yet, but I believe this will be important to be more effective as tasks become more complicated.

But there’s more than just the LLM and how to operate it.

You’re Only as Good as Your Tools, Also When Vibing

It’s quite clear that mistakes done by LLMs, and humans, can be avoided/caught with the help of the right tools. Even in my small example described above, cooperation of the LLM agent with external tools (console logs, shell commands) resulted in better understanding and a more independent agent.

I suspect that having more tools, e.g. relevant MCP server for documentation, can significantly help. I expect the integration of LLMs with tools will become more prominent and more necessary to create more independent coding agents.

One often overlooked tool is the simple document explaining the context of the project, specific features and current tasks. When LLMs will work seamlessly with Architecture Decision Records, and diagram as code tools I expect to see better results. The memory bank approach seems to be a step in that direction, though it’s hard to assess how effective it is.

I have noticed in this exercise that supplying the LLM with context of how a flow works currently (e.g. loading the data), allows it to identify the necessary changes more easily.

Diagram as code play a role now not just for humans developers, but also as a way to encode context for the application. There’s a feedback loop here between the LLM generating documentation, and using it as input for further tasks.

Effective Vibing

The real question is about the effectiveness of the vibe coding approach. With what degree of agent independence can we achieve good results.

I’m not sure how to assess this. One approximation of this might be the rate of bugs to user chat messages times lines generated in a given vibe coding session. But there are obviously other parameters involved.

It will be interesting to measure this over time, with more integrated tools, improved LLMs and possibly improved tools.

I’m not sure how this will evolve over time. I do think, however, that if LLMs with coding tools will be reduced to a glorified low-code platform it will be a miss for software engineering in general. The technology seems to be more powerful than that, since it has the potential to more easily bridge the gap between human language and rigorous computer programs; and do it in both directions.

On to explore more.



  1. Not that there’s anything wrong with that ↩︎
  2. Yep, I asked Cursor to keep track of the changes at this point ↩︎
  3. A phrase which, I guess, is close to becoming a meme onto itself ↩︎
  4. But then again, not sure it’s a problem in the long run ↩︎
  5. Always looking for opportunities to paraphrase one of my favorite book series; couldn’t resist this one ↩︎

AI and the Nature of Programming – Some Thoughts

So, AI.
It’s all the rage these days. And apparently for good reason.

Of course, my curiosity, along with a fair amount of FOMO1 leads me to experimenting and learning the technology. There’s no shortage of tutorials, tools and models. A true Cambrian explosion of technology.
This also aligns fairly easily with my long time interests of development tools, development models, and software engineering in general. So there’s no shortage of motivation to dive into this from a developer’s perspective2.

And the debate is on, especially when it comes to development tools.

It’s no secret that tools powered by large language models (LLMs), like Github Copilot, Cursor, and Windsurf3, are becoming indispensable. Developers all over adopt them as an essential part of their daily toolset. They offer the ability to generate code snippets, debug errors and refactor code with remarkable speed. This shift has sparked a fascinating debate about the role of AI in coding. Is it merely a productivity booster? Or does it represent a fundamental change in how we think about programming itself?

At its core, coding with AI promises to make software development faster and arguably more accessible. For simple, well-defined tasks, AI can produce functional code in seconds. This reduces the cognitive load on developers and allows them to focus on higher-level problem-solving. But software development in the wild, especially for ongoing product development, becomes very complicated very quickly. As complexity grows, the limitations of AI-generated code become obvious. While LLMs can produce code quickly and easily, the quality of its output often depends on the developer’s ability to guide and refine it.

So while AI excels at speeding up simple tasks, there are still challenges with more complex tasks. And there are implications to the ability to maintain code over time. But, I cannot deny we’re apparently at the beginning of a new era. And this raises the question of whether traditional notions of “good code” still apply in an era where AI might take over the bulk of maintenance work.

And I ask myself (and you): can we imagine a future where AI no longer generates textual code? Instead, it operates on some other canonical representation of logic. Are we witnessing a shift in the very nature of programming?

Efficiency of AI in Coding

Before diving into the hard(er) questions, let’s take a step back.

One of the most compelling advantages of coding with AI is its ability to significantly speed up the development process. This is especially true for simple and focused tasks. AI-powered tools, like GitHub Copilot and ChatGPT, excel at generating boilerplate code, writing repetitive functions, and even suggesting entire algorithms based on natural language prompts. For example, a developer can describe a task like “create a function to sort a list of integers in Python,” and the AI will instantly produce a working implementation. This capability not only saves time. It also reduces the cognitive burden on developers4. Consequently, developers can focus on more complex and creative aspects of their work.

The efficiency of AI in coding is particularly evident in tasks that are well-defined and require minimal context. Writing unit tests, implementing standard algorithms, or formatting data are all areas where AI can outperform human developers in terms of speed. AI tools can also quickly adapt to different programming languages and frameworks, making them versatile assistants for developers working in diverse environments. For instance, a developer switching from Python to JavaScript can rely on AI to generate syntactically correct code in the new language, reducing the learning curve and accelerating productivity. I often use LLMs to create simple scripts quickly instead of consulting documentation on some forgotten shell scripting syntax.

AI’s effectiveness in coding often depends on the developer’s ability to simplify tasks. The developer should break down larger, more complex tasks into smaller, manageable components. AI thrives on clarity and specificity; the more focused the task, the better the results. Yes, we have thinking models now, and they are becoming better every day. Still, they require supervision and accurate context. Contexts are large, and they’re not cheap.

At this point in time, developers still need to break down complicated tasks into more manageable sub tasks to be successful. This is often compared to a senior developer/tech lead detailing a list of tasks for a junior developer. I often find myself describing a feature to an LLM, asking for a list of tasks before coding, and then iterating over it together with the LLM. This works quite well in small focused applications. It becomes significantly more complicated with larger codebases.

While AI excels at handling simple and well-defined tasks, its performance tends to diminish as the complexity of the task increases. This is not necessarily a limitation of the AI itself but rather a reflection of the inherent challenges in translating high-level, ambiguous requirements into precise, functional code. For example, asking an AI to “build a recommendation system for an e-commerce platform” is a very complex task. In contrast, requesting a specific algorithm, like “implement a collaborative filtering model”, is simpler. The former requires a deep understanding of the problem domain, user behavior, and system architecture. These are areas where AI still struggles without significant human guidance.

As it stands today, LLMs act as a force multiplier for developers, enabling them to achieve more in less time. The true potential is realized when developers approach AI as a collaborative tool rather than a fully autonomous coder.

The “hands-off” approach (aka “Vibe coding“), where developers rely heavily on AI to generate code with minimal oversight, often leads to mixed results. AI can produce code that appears correct at first glance. Yet, it can contain subtle bugs, inefficiencies, or design flaws that are not immediately obvious. This is just one case I came across, but there are a lot more of course.

It’s not just about speed

But it’s more than simple planning, prompt engineering and context building. AI can correct its own errors, autonomously.

One of the most impressive features of AI in coding is its ability to detect and fix errors. When an LLM generates code, it doesn’t always get everything right the first time. Syntax errors, compilation issues, or logical mistakes can creep in. Yet, modern AI tools are increasingly equipped to spot these problems and even suggest fixes. For instance, tools like Cursor’s “agent mode” can recognize compilation errors. These tools then automatically try to correct them. This creates a powerful feedback loop where the AI not only writes code but also improves it in real time.

But it’s important to note here that there’s a collaboration here between AI and traditional tooling. Compilers make sure that the code is syntactically correct and can run, while LLMs help refine it. Together, they form a system where errors are caught early and often, leading to more reliable code. I have also had several cases where I asked the LLM to make sure all tests pass and there are no regressions. It ran all tests and fixed the code based on broken tests.
That is, without human intervention in that loop.

So AI, along with traditional tools (compilers, tests, linters) can be autonomous, at least to a degree.

It’s not just about correct code

As we all know, producing working code is only one (important) step when working as an engineer. It’s only the beginning of the journey. This is especially true when working on ongoing product development. It is probably less so in time-scoped projects. In ongoing projects, development never really stops. It continues unless the product is discontinued. There are mountains of tools, methodologies and techniques dedicated to maintain and evolve code over time and at scale. It is often a much tougher challenge compared to the initial code generation.

One of the biggest criticisms of AI-generated code is that it often lacks maintainability. Maintainable code is code that is easy to read, understand, and change over time. Humans value this because it makes collaboration and long-term project evolution easier. Yet, AI doesn’t always prioritize these qualities. For example, it might generate long, convoluted functions or introduce unnecessary methods that make the code harder to follow.

The reality is that code produced by an LLM, while often functional, may not always align with human standards of readability and maintainability.
I stopped counting the times I’ve had some LLM produce running, and often functionally correct code, that was horrible in almost every aspect of clean and robust code. I dare say a lot of the code produced is the antithesis of clean code. And yes, we can use system prompts and rules to facilitate better code. However, it’s not there yet, at least not consistently. This issue is not necessarily a fault of AI itself. It reflects the difficulty in defining and agreeing on what constitutes “good code”.

Whether or not LLMs get to the point where they can produce more maintainable code is uncertain. I’m sure it can improve, and we haven’t seen the end of it yet. I wonder if that is a goal we should be aiming for in the first place. We want “good” code, because we are there to read it, and work with it after the AI has created it.

But what if that wasn’t the case?

A code for the machine, by the machine

LLMs are good at understanding our text, and eventually acting on it – producing the text that will answer our questions/instructions. And when it comes to code, it produces code, as text. But that text is for us – humans – to consume. So we review and judge through this lens – code that we need to understand and work with.

We do it with the power of our understanding, but also with the tools that we’ve built to help us do it – compilers, linters, etc. It’s important to note that language compilers are tools for humans to interact with the machine. It’s a tool that’s necessary when dealing with humans instructing the machine (=writing code). The software development process, even with AI, requires it because the LLM  is writing code for humans to understand. It also allows us to leverage existing investments in programming.

But when an LLM is generating code for another LLM to review, and when it iterates on the generated response, the code doesn’t need to be worked on by humans. Do we really need the code to be maintainable and clear for us?
Do we care about duplication of code? Meaningful variable and class names? Is encapsulation important?
LLMs can structure their output, and consume structured inputs. Assuming LLMs don’t hallucinate as much than I’m not sure type checking is that impactful as well.

I think we should not care so much about these things.
At the current rate of development around LLMs there’s no reason we shouldn’t get to a point where LLMs will be able to analyze an existing code base and modify/evolve it successfully without a human ever laying eyes on the code. It might require some fancy prompting or combination of multiple LLM agents, but we’re not so far.

Another force at play here, I believe, is that code can be made simpler and straightforward if it doesn’t need to abstract away much of the underlying concepts. A lot of the existing abstractions are there because of humans. Take for example UI frameworks, or different SDKs, component frameworks and application servers. Most of the focus there is about abstracting concepts and letting humans operate at a higher level of understanding. It can be leveraged by LLMs, but it doesn’t necessarily have to be. Do I need an ORM framework when the LLM simply produces the right queries whenever it needs to?
Do I need middleware and abstractions over messaging when an AI agent can simply produce the code it needs, and replicate it whenever it needs to?

My point is, a lot of the (good) concepts and tools and frameworks we created in programming are good and useful under the assumption that humans are in the loop. Once you take humans out of the loop, are they needed? I’m not so sure.

The AI “Compiler”

Let’s take it a step further.
Programming languages are in essence a way for humans to connect with the machine. It has been this way since the early days of assembly language. And with time, the level of expression and ergonomics of programming languages have evolved to accommodate humans working with computers. This is great and important, because we humans need to program these damn computers. The easier it is for us, the more we can do.

But it’s different when it’s not a human instructing the machine. It’s an AI that understands the human, but then translates it to something else. And another AI works to evolve this output further. Does the output need to be understandable by humans?
What if LLMs understand the intent given by us, but then continue to iterate over the resulting program using some internal representation that’s more efficient?

Internal representations are nothing new in the world of compilers. Compiler developers program them to enable various operations that compilers often perform. Operations like optimizations, type checking, tool support, and generating outputs.
Why can’t LLMs communicate over code using their own internal representation, resulting in much more efficient operation and lower costs?
This is not just for generating a one-time binary, but also for evolving and extending the program. As we observed above, software engineering is more than a simple generation of code.
It doesn’t have to be something fancy or with too many abstractions. It needs to allow another LLM/AI to work and continue to evolve it whenever a new feature specification or bug is found/reported.
Do we really need AI to produce a beautiful programming language, mimicking some form of structured English, when there’s no English reader who’s going to read and work on it?

Why not have AI agents produce something like “textual/binary Gibberlink” an AI-oriented “bytecode” when producing our programs:

Is the human to machine connection through a programming language necessary when we have a tool that understands our natural language well enough, and can then reason on its own outputs?

LLMs can already encode their output in structured formats (e.g. JSON) that are machine processable. Is it that big of a leap to assume they’d be able to simply continue communicating with themselves and get the job done based on our specifications, without involving us in the middle?

Vibe coding is apparently a thing now. I don’t believe it’s a sustainable trend5. But the main reason is that it focuses on a specific point in the software life cycle – the point of generating code.
What if we can take it to the extreme? What if we remove the human from the coding process throughout the software life cycle?

I can’t really predict where this is going. At this point I don’t know the technology well enough to guesstimate, and I’m no oracle. But I do see this as one possible direction with a definite upside. And it’s definitely interesting to follow.

If programming is machines talking to machines, maintainability and evolution of code becomes a different game.

“Code” becomes a different game.

Is programming dead?

What would such a future hold for the programming professionals?

Again, I’m not great at making prophecies. But the way I see it, and looking at history, I don’t belong to the pessimistic camp. So in my opinion – no, I don’t subscribe to the notion that programming is dead.

History has taught us an important lesson. Creating software with more expressive power did not decrease the amount of software created. A higher level of abstraction did not lessen software production either. Quite the contrary. More tools, and being capable of working at higher levels of abstraction meant that more software is created. Demand grew as well. It’s just the developers that needed to adapt to the new tools and concepts. And we did6.

Demand for software still exists, and it doesn’t look like it’s receding. I believe that developers who will adapt to this new reality will still be in demand as well.

I expect LLMs will improve, even significantly, in the foreseeable future. But this doesn’t mean there’s no need for software developers. I expect software development tasks to become more complex. As developers free their time and minds from the gritty details of sorting algorithms, database schemas and implementing authentication schemes, they will focus on bigger, more complicated tasks. So software development doesn’t become less complicated, we’re just capable of doing more stuff. Complexity is simply moving to other (higher level?) places7

Could it be that software architects will become the new software engineers?
Are all of us going to be “agent coders”?

I really don’t know, but I intend to stick around and find out.

Where do you think this is going?


  1. And, admittedly, fear of becoming irrelevant ↩︎
  2. And yes, AI was used when authoring this post, a bit. But no LLM was harmed, that I know of ↩︎
  3.  Originally I intended to add more examples, but realized that by the time I finish writing a list, at least 3 new tools will be announced. So… [insert your favorite AI dev tool here] ↩︎
  4. Give or take hallucinations ↩︎
  5. Remember no-code? ↩︎
  6. I’m old enough to have programmed in Java with no build tool, even before Ant. Classpath/JAR hell was a very real thing for me. ↩︎
  7. “The law of complexity preservation in software development”? ↩︎