From prompt to PR: designing a human-in-the-loop review pipeline
The promise of AI coding agents is speed. The risk is that speed without oversight produces bad code faster. Every team we talked to during development had the same concern: 'How do I stay in control without becoming a bottleneck?'
The answer isn't more automation - it's better surfaces for human judgment. We designed Phasr's review pipeline around one principle: agents propose, humans approve. Every AI-generated change is a proposal until a human reviews and merges it. Nothing lands in your main branch without explicit approval.
The pipeline has three stages. First, the agent works on its task in an isolated worktree, streaming its progress in real time. You can watch the diff evolve as the agent writes code - file by file, hunk by hunk. This isn't a black box. You see exactly what's happening, as it happens.
Second, when the agent signals it's done, the review surface appears. This is a purpose-built diff viewer - not a terminal dump, not a raw Git diff. It highlights the semantic structure of changes: new functions, modified signatures, changed imports. Inline annotations flag areas that might need attention: large deletions, new dependencies, modified test assertions.
Third, you act. Approve the changes with one click, and a PR is created against your target branch. Request modifications, and the agent re-enters its worktree with your feedback. Discard the task entirely, and the worktree is cleaned up. There are no half-states. Every task resolves to a clean outcome.
We iterated on the review UX extensively. Early versions showed the full diff at once, which was overwhelming for large changes. We switched to a file-by-file view with a summary panel that highlights the most-changed files. We also added a 'risk score' - a heuristic that flags changes to critical paths (auth, payments, database migrations) so reviewers know where to focus.
One feature that surprised us with its usefulness: inline chat. You can leave a comment on any line of the diff, and the agent will see it as context if you ask it to revise. It's like a code review conversation, but one participant is an AI that can immediately act on your feedback.
The result is a pipeline where agents run at full speed - they're never waiting on humans - but humans maintain full control over what ships. It's not a tradeoff between speed and quality. With the right review surface, you get both.