Verification in AI-Assisted Coding: What It Is and What We're Building

Raluca Crisan
February 23, 2026
Blog

At Etiq, the tools we've developed have converged, somewhat unexpectedly, on an area we're calling verification in AI-assisted coding. It's not where we started, and we're not certain the label will stick, but the pattern is clear enough to be worth describing. This post sets out what we mean by verification, how it already shows up in mainstream AI coding tools, and how our approach differs.

What verification means in this context

When we talk about verification, we're referring to a subset of the techniques that AI coding assistants use to improve and validate their own outputs. These techniques sit across a broad spectrum: from static analysis and formal methods, runtime logging and agent tracing, through to agent loops that iterate through test results and build outputs, or grounding with real metadata for data copilots and RAG systems.

The problems these techniques address are significant. LLM-based coding agents hallucinate files, data, and function outputs. This can cause the referencing of artifacts that don't exist, or producing content that doesn't match what was requested. Debugging spirals are common, where an agent's attempt to fix a bug introduces new ones. Human users struggle to keep track of what the agent has actually done versus what it claims to have done. And trust, however you want to define it, remains a persistent issue in any workflow where an LLM is generating or modifying code.

Some verification techniques operate closer to the model itself as part of e.g. general inference-time scaling than others; some verifiers  include an LLM or model component, whereas others are symbolic/rule-based. Our interest is specifically in the latter, ‘deterministic verification’ that produces consistent results. Something you can rely on, not something that's probabilistically correct.

How verification shows up: Claude Code

A concrete example helps illustrate the pattern. Claude Code's system prompt instructs the agent to perform a directory verification step before creating files or directories. Specifically, the prompt directs: "If the command will create new directories or files, first use the ls tool to verify the parent directory exists and is the correct location." This is documented in Claude Code's reverse-engineered system prompt, which also instructs the agent to verify solutions with tests where possible and to run lint and type-check commands after completing tasks.

This is, in itself, a form of verification, the agent checks conditions in the real filesystem before and after acting, rather than assuming its operations succeeded. It's not fully deterministic (the agent doesn't always enforce these checks uniformly), and it's not externally imposed (it's baked into the system prompt rather than architecturally guaranteed). But it works, and it meaningfully reduces the rate at which the agent produces phantom outputs.

The general verification loop

Regardless of where in the stack it is implemented , verification tends to follow a common loop. An LLM generates some candidate output: a proof, a code snippet, a reasoning step. A checking mechanism evaluates said output,. If the check fails, the feedback (e.g. an error message, a failing test) is returned to the LLM for correction. This repeats until the verifier accepts the output or some retry limit is reached.

A minimal agent example: Pi

For a clean, practical example of this loop in an agent context, consider Pi, the minimal coding agent created by Mario Zechner that powers the OpenClaw project (formerly Clawdbot). Pi has exactly four tools (read, write, edit, bash) and a system prompt under 1,000 tokens. Its core loop is straightforward: stream the LLM's response, execute any tool calls the assistant requests, feed the tool results back into context, and repeat until the agent signals it's done.

The "verification" here, and Zechner probably wouldn't call it that, is the execution of the tool itself. When the agent writes a file and then runs a bash command to test it, the results of that execution become ground truth that the LLM assesses and uses to steer its next action. The agent isn't verifying in a formal sense, but the dynamic is the same: generate, check against reality, correct if needed.

What Etiq does differently

What we've built at Etiq is a different kind of verifier, which opens up different options for different use cases.

Our tool parses code and captures the logic, the sequence of steps in a script or pipeline (and as you’ll see in our docs we are focused on specific use cases). It then captures the artifacts produced at every step and links each artifact to the function that produces it. It does this deterministically, with minimal overhead and no instrumentation needed (with the usual caveat that we're still fixing bugs).

Once we have this information, we can verify artifacts, trace their relationships with each other, perform data testing, and search through the lineage for the root cause when tests fail. 

At a basic level, the loop looks like this: the agent produces an initial script; we parse it and map the artifacts and their producer functions; we run each producer sequentially based on parent-child dependency; we verify that the artifact was produced and passes defined criteria; if it fails, we loop back and retry. If it passes, the agent moves to the child artifact.

For more test-heavy workflows, the loop adds a lineage-graph-search dimension: test the output artifacts; if a test fails, walk up to the parent artifact and test that; if the parent passes, the problem is in the child's producer function; if the parent also fails, continue walking up the dependency tree until you isolate the root cause.

In practice both loops are more complex than described here, but the core idea holds: deterministic parsing gives us a map of what should exist and how it relates, and that map becomes the backbone for structured, traceable verification.

Why this matters

The underlying bet is that as AI-assisted coding becomes more autonomous, with longer agent runs, more complex pipelines, less human oversight per step, the need for reliable, deterministic verification grows. Probabilistic self-checks are better than nothing. But deterministic verification, grounded in actual code structure and artifact dependencies, gives you something firmer to build on.

We're still exploring this space and welcome thoughts and suggestions. If you want to see it in action, try Etiq.