The AI Tools Actually Running Agency Work Right Now

Introduction

Agency work is repetitive in specific ways. Rebuilding a client’s website means cleaning years of plugin clutter and recreating pages one by one. Producing branded graphics means resizing the same assets across a dozen formats, checking hex values, making sure the logo isn’t too close to the edge. It’s skilled work, but a lot of it is mechanical — and that’s exactly where AI tools are starting to matter.

This week we went deep inside two of them. Here’s what we actually found.

Website AI — The Wizard Is Wired

Week two was about going from a plan to something that actually runs. The pipeline is a seven-step intake wizard that collects everything Claude needs to build the site: source URL, site type, brand identity, colours, typography, tone of voice, design direction. All of that gets packaged into a single BuildBrief JSON object that feeds every downstream stage. From there the pipeline moves through sitemap generation, design token creation, and per-page HTML output — with a review step after each page before the next one generates.

The stack is Next.js 14 with the App Router handling both the wizard frontend and the backend API routes in the same repo, outputting static HTML, CSS, and vanilla JavaScript. Three separate prompt files handle the Claude calls — one for sitemap, one for design tokens, one for page generation — each returning either structured JSON or raw HTML.

The wizard is fully wired and the pipeline architecture is complete: animated slide transitions, live font previews, a two-panel design style picker with real rendered mockups, and the full Goose Digital brand throughout. The type system is tight — one lib/types.ts file drives every stage end to end. Claude JSON reliability is higher than expected: schema-constrained prompts return valid structured output consistently without needing JSON mode enforcement. The pipeline is built but not yet tested end to end because the API key isn’t in yet. That’s the first thing on the board for week three.

Image AI — Three Steps, One Pipeline

The image pipeline is further along in a different way — it has been running and generating real outputs, which means it has also been failing in real and instructive ways.

The system runs three AI steps in sequence:

[ image-ai-pipeline ]

step-1 :: layout LLM (Gemini 2.5 Pro)     — reads brand kit + prompt, outputs full JSON layout
                                             with positions, z-ordering, and image prompts per zone
step-2 :: image gen (Nano Banana 2)        — fills each zone with photos, scenes, product shots;
                                             brand assets baked in as visual references
step-3 :: vision analysis (Gemini 2.5 Pro) — optionally decomposes an uploaded design into layers
                                             to replicate it in a new brand's style

What this means in practice: upload a client’s brand kit once — colours, fonts, logos, tone, target audience — and generate on-brand creatives across channels without touching Photoshop. Edit text inline, swap images, tweak colours, re-prompt, export. One person doing what used to take a designer, a copywriter, and a calendar invite to align on feedback.

The intake side matters more than we expected. Colour usage rules, image style notes, things to actively avoid. The more specific the brand kit, the less cleanup the output requires. When the intake is detailed, outputs land close to on-brand on the first pass. When it’s vague, they don’t. The creative direction still has to come from somewhere. That part didn’t get automated.

What’s still rough: unusual brand rules — condensed display fonts, strict logo exclusion zones, very specific negative space requirements — still drift. And the handoff between pipeline stages is where most errors accumulate. Each step introduces a small amount of inconsistency, and those compound. By the time the layout model has positioned elements, the image model has filled the zones, and the canvas has rendered the output, small errors from each stage can add up to something that needs real attention.

What This Says About AI Tools for Agencies

The two projects surface the same pattern from different angles. Structured output reliability is higher than expected — modern LLMs follow schemas consistently enough to build real pipelines on, not just demos. Agent content extraction and rebuild is mature and composable; the tooling exists and it works. Brand-aware creative generation is genuinely further along than the current discourse suggests, at least when the inputs are good.

The rougher edges are consistent too. Consistency at the edges of normal — unusual content structures, unconventional brand rules, multi-stage pipelines where drift compounds — still needs human review. And tooling for the last mile, the point where AI output meets production reality, is still something most teams have to build themselves: validation steps, diff review, checkpoints where a human looks before anything ships.

The most useful mental model we landed on: AI tools work best in agency workflows when they generate something editable, not when they try to generate the final thing. A layout that renders onto a canvas you can still adjust. A site that gets reviewed page by page before it goes live. The systems that held up best this week were the ones designed around the assumption that a human will look at the output — not as a fallback, but as part of the workflow.

The tools are ready. The gap between promising and production is closing. It closes faster for teams that treat AI as a first draft.