Building a skill optimization loop

This post shows how to create a loop with automated feedback that an agent can run to optimize its own Skills. It uses an automated grader with computer use to assess how well a Skill is performing, and then iteratively improves the Skill.
You can apply this technique to create improvement loops any time a Skill has clear validation criteria.
I’ll do this example with a Skill for replatforming websites from WYSIWYG platforms to self-hosted code (we did this recently for our own marketing site). Let’s call this Skill /replatform-site (source here). It’s a generally useful Skill, but the goal of this post is less about this particular use case than showing the meta-process to evaluate and improve Skills on a loop.

Let’s say, for fun, I’m starting a new podcast called Talking Slop, where I go over AI trends with other people who like to chat about them. You can see that I’ve stood this site up at talkingslop.ai, and it’s currently hosted on a WYSIWYG no-code platform. I think I could slopmaxx this site even faster if it was hosted as code on Vercel, so I’m going to run /replatform-site on it.
On the initial run, it got pretty far, but has one obvious visual defect due to missing icons for these dropdown toggles.

You can compare talkingslop.ai to the generated port on talking-slop.vercel.app
This is where the loop comes in. Since this is a verifiable task, you can create loops to automatically improve it. In fact, there are two loops you might implement here:
- A loop to make sure that this specific replatforming worked.
- A loop to make the replatforming Skill itself more likely to work better next time.
I’m going to focus on the second kind of loop here, the outer loop. See my post on self-improvement loops for more on the inner vs. outer loop distinction.
I set up this loop through another Skill, using a pattern of creating an “observer” skill (source) that grades the quality of the “inner” Skill. This observer Skill takes as input N sites to replatform, calls the /replatform-site on them, and then builds those sites and examines them for behavioral and visual differences using computer use and browser use. It also tracks how many tokens the replatforming took and attempts to optimize cost while maintaining quality.
It synthesizes the results using a SOTA model, looks for failure patterns and opportunities to improve, and then creates a diff to update the inner Skill. Because Skills are just files, you can use any coding agent like Warp to do this analysis and submit a PR to update the Skill.

The replatforming “observer” Skill
The observer Skill uses structured data results so that it can be as intelligent as possible when suggesting fixes to the inner Skill. It’s fairly sophisticated, but the general concept is simple: run the inner Skill, record its failures, make a diff to improve it, repeat.
In order to run this loop, you’ll want to use a platform that supports orchestration of multiple agents along with computer use. I used Oz, which is built into Warp and supports computer use across multiple SOTA models, but there are plenty of options out there (if you want to try it in Oz, I made a separate Skill for it called /oz-orchestrated-replatforming).
Here’s a version of the diff it created on an early run on talkingslop.ai. Here, it noticed the dropdown issue and suggested an improvement):

If you want to tune this Skill on a significant corpus for wider use, you can scale the number of input sites and keep iterating until the diffs generated by the observer become less meaningful. The observer itself has exit criteria baked into it for when to stop looping so you don’t burn tokens optimizing forever.

This system isn’t perfect – there’s only so much you can improve by tuning Skills and it’s susceptible to finding local maxima – but it’s pretty handy as a simple way of making sure your Skills perform well.
If you want to give it a go, all of the relevant skills are open sourced here: https://github.com/warpdotdev-demos/replatformer
And stay tuned for my first episode of Talking Slop 😉
Related articles

Jun 18, 2026 · 1 min
Generate interactive PR Walkthroughs with a single Skill
While we’re waiting for someone to re-invent Github I put together a useful skill that can help folks better understand agent (or human) generated PRs.

Jun 16, 2026 · 4 min
How to build a self-improvement loop for your Skills
There’s been a lot of chatter about using “loops” lately to drive agents, and I think this has been accompanied by a bit of “what actually is a loop”?

Jun 11, 2026 · 2 min
Three skills you need for spec-driven development
Use Skills to guide agents through product specs, technical specs, implementation, and validation so spec-driven development is repeatable and reviewable.