Vibe coding: Can we reimagine software delivery with…

In our Techblog series, discover the thinking, creativity, and curiosity of our digital and data experts. Sharing ideas, experiences, and innovations that are shaping the way we work and create impact.

Nick leads cross-functional teams of digital strategists, designers, architects, and software engineers, driving experimentation in AI-assisted software delivery and scaling our product and technology offerings.

To be able to solve our clients’ biggest challenges, it’s on us to be in a permanent state of experimentation with our approach to software engineering delivery.

The explosion of genAI tooling has presented us all with the challenge of choosing where to invest time and energy; so we pulled together a cross-functional team to experiment with vibe coding techniques to allow us to advise our clients on what’s possible, what’s practical, and what’s worth pursuing. How far could we get in an afternoon by prompting alone, and what learnings can we take to our project teams and clients?

Along with my PA software engineering colleagues, I'm using AI-assisted programming in my day-to-day, whether it be GitHub Copilot, ChatGPT, or our own suite of internal Genie apps. And while not all AI-assisted programming is strictly “vibe coding”, we wanted to go a step further and test some theories, including what happens when you step away from the code, describe the output, and let genAI find the best route to the solution?

To test our thinking, we gathered 15 of our digital strategists, designers, architects, and software engineers, as well as business leaders, in a workshop room with a loose brief to develop an internal knowledge content management system and deploy a working prototype in one afternoon.

The condensed software development lifecycle

To squeeze this activity into an afternoon, each discipline came with a set of tooling they wanted to experiment with, from different ChatGPT and Gemini models to Copilot, Cursor, and v0 ready to go.

First, we started with a bunch of verbal requirements from stakeholders – ambitious, at times ambiguous, and these were communicated as a stream-of-consciousness to the team. Next, we recorded these verbal requirements and used ChatGPT to structure them into a Product Requirement Document that leadership could validate before proceeding. We used various chat-based large language models (LLMs) to transform these requirements into a high-level solution design, complete with analysis of trade-offs for each option. We let AI generate UI designs based on the functional requirements, then frontend and backend code, through to deployment pipelines.

Developers and designers became orchestrators, editors, and reviewers. No one wrote a line of code by hand, and by the end of the session, we had a working deployment in Vercel that the business could feed back on.

What worked surprisingly well

We found some clear strengths in this approach, most notably in speed:

Shaping chaos into clarity. From feeding in chaotic verbal requirements (someone literally started describing the system by drawing shapes in the air) and producing a clean, readable requirements document that made sense to everyone.
Architectural reasoning was sound. We were impressed by how well tools like ChatGPT and Cursor could reason about architectural patterns. They didn’t just throw out buzzwords, but considered trade-offs, suggested relevant tooling, and even pointed out accurate bottlenecks when told the solution neededs to be developed using genAI tooling alone.
Developers didn’t write code. That sounds obvious, but it was a huge moment. Cursor scaffolded the entire project, API endpoints, and unit test stubs. The software engineers weren’t passive observers, they were reviewing, fixing edge cases and checking logic, but they weren’t typing code.
Design velocity was high. AI tools spat out clean, modern user interfaces (UIs) almost instantly. We didn’t need to wait for wireframes or mock-ups; we could test workflows immediately.

What didn’t work (yet)

There were a significant number of friction points we had to work through:

Chat-based tools are single-player. Collaboration felt awkward. For chat-based tooling we had one person ‘driving’ the AI at a time, and transferring context between users was painful with no real session persistence or shared state across a team. We kept asking “Can you paste me the latest output?” via Slack and Teams, which was a painful process.
The UI didn’t feel designed. While the interface was usable, it lacked soul. It had that instantly recognisable ’default SaaS’ look. Our designer lamented the fact he didn’t actually do any design.
Engineering trust wasn’t automatic. Our engineers weren’t ready to ship what the AI wrote without some scrutiny. It didn’t make catastrophic errors, but we did need to keep a check on the code itself to fix bugs.
Complex repo workflows broke things. Cursor struggled with our monorepo setup and couldn’t handle multi-branch workflows effectively. We ended up splitting frontend and backend into different repos just to get things moving.
Diagrams were a weak spot. Whether it was boxes and arrows for system architecture or flowcharts for business logic, the SVG and Mermaid output from ChatGPT, Copilot, and Gemini was ugly at best, or usually entirely broken and unusable.

The engineer’s view: Cautious optimism

From the engineering team, the sentiment was somewhere between curious and sceptical. There was excitement about prototyping speed, but genuine wariness about depth and quality. There was also a concern about observability, maintainability, and testing. After a full review, we found code that wasn’t particularly well-commented or structured, so we’ve already started to define internal PA patterns for how we “trust but verify” genAI output, especially around error handling, authentication logic, test scenarios, readability, and dependency management.

The designer’s view: Tension versus opportunity

Our designers had a similarly mixed reaction to the real tension between speed and craft. Cursor and v0 can produce decent default layouts, consistent design tokens, and responsive components. But they don’t understand tone, nuance, accessibility edge cases, or micro-interactions, nor do they have a point of view on what our users want and need. The generated interface looked clean and functional, but it didn’t feel ours.

That said, there are clear opportunities here for designers to iterate more quickly from a first draft, then layer in style, personality, and accessibility. Designers can also achieve greater consistency across components and systems quicker, especially useful in large enterprise environments, spend more time on experience thinking (goals, flows, interactions), and less time on repetitive Figma layout tasks.

Leadership’s view: Augmenting opportunities

As someone thinking about how we deliver software at scale for ourselves and our clients, I’m cautiously excited. There’s a real opportunity here to shorten time-to-first-value, especially for internal tools and minimal viable products (MVPs). There’s also a big opportunity to enable cross-functional teams to participate more deeply in product development, and to free engineers and designers from boilerplate and let them focus on the high-value problems.

But we also need to acknowledge this isn’t a ’replace your team with AI’ moment. This is ‘augment your team with AI’, and teams will need to work harder to derive genuine value from the tools.

You need more engineering and design leadership, not less, to ensure AI-generated software meets your standards. The tools are still maturing, and so collaboration, version control, UI design, and accessibility all need active consideration. And clients with regulatory constraints will need more transparency and auditability than these tools currently provide.

Building better, not just faster

There’s no doubt that vibe coding can increase delivery velocity. But the real test will be whether we can use it to build better software, not just faster software. That’s going to take thoughtful processes, experienced humans, and the right mix of creativity and caution.

We’re continuing to experiment with vibe coding techniques on low-risk internal projects, and we’re now running fortnightly workshops across our Digital and Data teams to allow more designers, architects, and software engineers to join a simulated environment to learn, share, and explore away from the restrictions of client engagements.