Category Archives: ai

Exploring Vibe Coding with AI: My Experiment

In my previous post I mentioned vibe coding as a current trend of coding with AI. But I haven’t actually tried it.

So I’ve decided to jump on the bandwagon and give it a try. Granted, I’m not the obvious target audience for this technique, but before passing judgment I had to see/feel it for myself.

It’s not the first time I generated code using an LLM with some prompting. But this time I was more committed to try out “the vibe”. To be clear, I did not intend to go all in with voice commands, transcription, and watching Netflix while the LLM worked. I did intend to review the code, and keep in touch with the output at every point. I wanted to test the tool’s capabilities while still being very much aware of what was going on.

Below is an account of what happened, my thoughts and conclusions so far.
A general disclaimer is of course in place: I’m still exploring these tools, and it’s quite possible there could be improvements to the process. My experience, however, is very much influenced by my experience as a developer. My choice of tools and how to use them is therefore very much biased towards usage as an experienced developer looking to increase productivity, not a non-coder looking to crank out one-off applications1.

The Setup

I set out to create a new simple tool for myself (actually to be used at work), something I actually find useful, and is not an obvious side project that’s been done a million times, and therefore less likely (I hope) to be in the LLM’s training data. It’s a project done from scratch, and I’m trying to do something that I don’t have a lot of experience with. It is also meant to be fairly limited in scope.

The project itself is a “Knowledge Graph Visualizer”, essentially an in-browser viewer of a graph representing arbitrary concepts and their relationships. I intended this to be purely browser, JS code. The main feature is a 3D rendering of the graph, allowing navigation through the concepts and their links. You can see the initial bare specification here.

To get a feel for the project, here’s a current screenshot:

KG-Viewer showing its own knowledge graph

With respect to tooling I went with Cursor (I use Cursor Pro), using primarily Claude-Sonnet 3.7 model. The initial code generation was actually done with Gemini 2.5 pro. But I quickly ran out of credits there. So the bulk of the work was done with Cursor.

I did not use any special cursor rules or MCP tools. This may have altered the experience to a degree (though I doubt it), so I will need to continue trying it as I explore these tools and techniques.

Getting Into the Vibe

It actually started fairly impressive. Given the initial spec, Gemini generated 6 files that provided the skeleton for a proof of concept. All of these files are still there. I did not look too deeply into the generated code. Instead, I initialized an empty simple project, launched Cursor, and copied the files there. With a few tweaks2, it worked. I had a working POC in about one hour of work. Without ever coding 3D renderings of graphs.

Magic!
I’ll be honest – I was impressed at first. I got a working implementation for drawing a graph with Three.js, for some JSON schema describing a graph. Given that I never laid eyes on Three.js, this was definitely faster than I would have gotten even to this simple POC.

I did peek at the code. I wasn’t overly impressed by it – there was a lot of unnecessary repetition, very long functions, and some weird design choices. For example, having a style.css holding all the style classes, but at the same time generating a new style and dynamically injecting it into the document.
But, adhering to my “viber code”, I did not touch the code, instead working only with prompts.

Then I started asking for more features.

Cursor/Claude, We Have a Problem

A POC is nice. But I actually need a working tool. So I started asking for more features.
Note, I did not just continue to spill out requests in the chat. I followed the common wisdom – using a new chat instance, laying out the feature specification and working step by step on planning and testing before implementation.

I wrote a simple file, which should allow me to trace the feature’s spec and implementation.
The general structure is simple:

- Feature Specification
- Plan
- Testing
- Implementation Log

Where I fill in only the Feature Specification, and let Cursor fill in the plan (after approval) and the “Implementation Log” as we proceed.

The plan was to have a working log of progress, to be used as both a log of the work, but also provide context to future chat sessions.

I don’t intend to re-create here the entire chat session or all my prompts, as this is not intended to be a tutorial on LLM techniques. But fair to say that the first feature (data retrieval), was implemented fairly easily, using only prompts.

Just One Small Change…

I was actually still pretty impressed at this point, so I simply asked for tiny small feature – showing node and link labels. I did it without creating an explicit “feature file”.

The code didn’t work. So I asked Cursor to fix it. And this quickly spiraled out of control. Cursor’s agent of course notified me on every request that it had definitely figured out the issue, and now it has the fix (!).
It didn’t.

I remained loyal to the “vibe coder creed”, and did not try to debug/fix the code myself. Instead deliberately going in cycles of prompting for fixes, accepting changes blindly, testing, and prompting again with new errors.

Somewhere along this cycle, the code changes made by the agent actually created regression in the application’s code resulting in the application not loading at all.

After roughly 3 hours, and a lot more grey hair, I did notice that the Cursor agent was going in circles – simply trying out the same 3 solutions, with no idea what’s wrong. But still confidently hallucinating solutions (“Now I see the issue…”3).

This was so frustrating that at this point I simply took it upon myself to actually look at the code, which was a complete mess. I looked at the problematic code, consulted git diffs to restore basic functionality, and solved the actual issue with about 10 more minutes of Google search.

To be fair, from my very rudimentary google search it seemed my request (link labels) wasn’t that easy to achieve. It’s apparently not that obvious (again, without being an expert on Three.js). I relaxed the requirement a bit, and found a simple solution.
Still, the whole cycle of back and forth of code changes, especially to unrelated code, was very much counter-productive. The vibes were all wrong. Getting back to working code took another 2-3 hours.

At this point I was thinking “oh well, you can’t win them all”. I wanted to turn to something simple. And looking at the state of the code, a simple cleanup should be easy enough, right?

Right? …

Now It’s Just Cleanup

Well … it depends.

I went back into “vibe coding” mode. This time, I defined very basic code cleanup procedures. I then asked Cursor’s agent (in a new session), to go through the source code and follow these steps to clean it up.

It actually did reasonably well for small files. The bigger files proved to be more challenging. Trying to clean them up ended up messing the files completely. For some reason, the LLM agent removed functioning code, and created functionality regressions. Trying to quickly fix them ended up in causing more issues. It was clearly guessing at this point.

Given my battle scars with the previous feature request, I avoided this hallucination death spiral. Instead, I went through git history, found a working version, and restored the working code “by hand” – actully typing in code. I wasn’t a vibe coder anymore, but the application worked, the code was cleaner, and my blood pressure remained fairly low (I think).

The experience felt like trying to mentor a junior developer to code without creating regressions. The problem is it’s a fast and confident junior developer, with short term memory loss, who is apparently so eager to please that he simply spews out code that looks remotely connected to the problem at hand, with little understanding of context; proving to be ignorant even of changes it itself made to the code.

Documentation for Man and Machine

At this point I decided to go back to basics, where LLMs truly shine – understanding and creating text. I asked for it to create documentation for specific flows in the code (init sequence, clicking on a legend item). Unsurprisingly, with a few simple prompts, the agent produced decent documentation for what I asked, including a mermaid.js diagram code.

This is important not simply because it allowed me to document project easily, which is nice. Creating a textual documentation of specific flow also allowed me to provide better context for other chat sessions. And this is an important insight – textual descriptions of the code are useful for humans as well as the LLMs.

Other Features

At this point I turned to develop more features – loading data and “node focus“. In both cases I went back to providing feature files, with specifications, and asking the agent to update the files with plans and implementation logs.

I was a bit more cautious now. I reviewed code more carefully and intervened where I felt the code wasn’t good. In some cases it was obvious the code wasn’t functionally correct, but instead of trying to “fight” with the agent, I accepted the code and went on to change it myself.

A repeating phrase in all my prompts at this point was:

Do minimal code changes. Change only what is needed and nothing more.

This, combined with being more cautious and careful, resulted in pretty good results. I managed to implement two features in a short time. Probably a bit shorter compared to what it would have taken me to run through Three.js tutorials and do it myself.

Final Thoughts

So where does this leave me?

I have a working application. And if I had to learn Three.js from scratch myself, it would have taken me considerably longer to create. It’s working, and it’s useful. This is an important bottom line.

Small Application, Good Starting Point

The initial code, generated by the LLM (Gemini or Claude) does serve as a good starting point, especially in areas or frameworks that unfamiliar to the developer.

But this is still a far cry from replacing developers. There are tool limitations, some of them, I expect, introduced by Cursor rather than the LLM. These limitations can cause havoc if the agent is left to proceed with no oversight.
And review is harder when there’s a ton of unorganized code4.

We can probably make it better with rules, better prompts, and combination of agents. And of course advances in LLM training.

This is a good starting point. But we need to remember this is a very small application, made from scratch. In the real world, a lot of use cases are not that simple at all. The more I read and think about it, this bears a striking resemblance to no-code/low-code tools. Also in those cases, it’s easy to achieve quick results for simple uses cases, but very hard to scale development when features creep in or the application needs to scale.

It’s not that low-code tools don’t have their place. They serve a very specific (viable) niche. But as experience shows, they haven’t replaced developers.

Could this be different?
What would it take to tackle more serious challenges, with “vibe coding”?

Context is King

It’s quite obvious that in the kingdom of tokens, amidst ramparts of code and wind all of chat messages, there is only one king, and its name is Context5. As LLMs are limited in their context size, and a lot of it is taken up by wrapping tools (Cursor in this case), context for an LLM chat is an expensive real estate.

So while context windows can get big, we’ll probably never have enough when we get to more complicated tasks and bigger code bases. There’s a preservation of complexity at play.

Accuracy and precision in the context play a crucial role in effectiveness. Context passed to LLMs need to be information-dense. We should probably start considering how efficient is the context we’re providing to LLMs. I don’t know how to measure context efficiency yet, but I believe this will be important to be more effective as tasks become more complicated.

But there’s more than just the LLM and how to operate it.

You’re Only as Good as Your Tools, Also When Vibing

It’s quite clear that mistakes done by LLMs, and humans, can be avoided/caught with the help of the right tools. Even in my small example described above, cooperation of the LLM agent with external tools (console logs, shell commands) resulted in better understanding and a more independent agent.

I suspect that having more tools, e.g. relevant MCP server for documentation, can significantly help. I expect the integration of LLMs with tools will become more prominent and more necessary to create more independent coding agents.

One often overlooked tool is the simple document explaining the context of the project, specific features and current tasks. When LLMs will work seamlessly with Architecture Decision Records, and diagram as code tools I expect to see better results. The memory bank approach seems to be a step in that direction, though it’s hard to assess how effective it is.

I have noticed in this exercise that supplying the LLM with context of how a flow works currently (e.g. loading the data), allows it to identify the necessary changes more easily.

Diagram as code play a role now not just for humans developers, but also as a way to encode context for the application. There’s a feedback loop here between the LLM generating documentation, and using it as input for further tasks.

Effective Vibing

The real question is about the effectiveness of the vibe coding approach. With what degree of agent independence can we achieve good results.

I’m not sure how to assess this. One approximation of this might be the rate of bugs to user chat messages times lines generated in a given vibe coding session. But there are obviously other parameters involved.

It will be interesting to measure this over time, with more integrated tools, improved LLMs and possibly improved tools.

I’m not sure how this will evolve over time. I do think, however, that if LLMs with coding tools will be reduced to a glorified low-code platform it will be a miss for software engineering in general. The technology seems to be more powerful than that, since it has the potential to more easily bridge the gap between human language and rigorous computer programs; and do it in both directions.

On to explore more.



  1. Not that there’s anything wrong with that ↩︎
  2. Yep, I asked Cursor to keep track of the changes at this point ↩︎
  3. A phrase which, I guess, is close to becoming a meme onto itself ↩︎
  4. But then again, not sure it’s a problem in the long run ↩︎
  5. Always looking for opportunities to paraphrase one of my favorite book series; couldn’t resist this one ↩︎