Back to all posts
engineeringproduct

Reducing Review Noise: How We Turned Down the Volume

Revvu Team/April 7, 2026/5 min read

Three weeks after launching the first version of Revvu, we got a message from one of our earliest users that stopped us cold: "I've started collapsing all of your bot's comments without reading them. It posts the same things every time I push." This was someone who had been excited about the product. They'd installed it on day one. They believed in the idea. And we were losing them.

The reviews weren't wrong. The findings were real — actual issues in actual code. But the bot couldn't stop repeating itself. Every time the developer pushed new commits to address feedback, the bot would re-analyze the whole change and post everything again — including the comments that were already there. Good feedback, delivered badly, becomes noise. And noise gets ignored.

Dozens of identical sticky notes overlapping on a dark wall in moody blue and yellow lighting
When signal gets buried in repetition, even good feedback gets ignored.

The problem with saying it twice

The math was brutal. Every time a developer pushed new commits to a PR, we'd re-analyze the entire change and post all findings — including ones we'd already posted. Fix one issue out of ten? Nine duplicate comments appear. Push three times to address feedback? That's 27 comments where there should be 10. Imagine a colleague who gives you the same feedback every single time you talk to them, even after you've addressed it. You'd tune them out by the third conversation. That's exactly what was happening.

The obvious fixes that didn't work

Our first attempt was matching by text — has the bot already posted a comment with this exact wording on this file? It caught about 30% of duplicates, which meant 70% still got through. The problem is that AI doesn't produce identical text on every run. "This variable could be null" becomes "Consider adding a null check here" on the next analysis. Same issue, different words, no match.

We tried matching by line number next — is there already a comment on line 42? That failed even worse. Every time someone pushes new commits, line numbers shift. A comment on line 42 might now correspond to line 47 or line 39. And sometimes two completely different issues appear on the same line. Simple solutions weren't going to work. We needed the system to understand what a comment means, not just what it says.

The breakthrough

What finally worked was giving each comment a unique identity — a signature based on the file, the surrounding code context, and the core meaning of the feedback, stripped of the specific wording. We normalize the text so that "this could be null" and "add a null check" produce the same signature. It's not perfect, but it catches about 95% of duplicates — enough that the remaining misses are rare rather than constant.

Close-up of a single fingerprint glowing with soft blue bioluminescent light against deep black background
Every comment gets a unique identity based on meaning, not wording.

Each signature is embedded invisibly in the posted comment. On future reviews, the system reads its own previous comments, recognizes their signatures, and decides what to do. It's like leaving yourself a note that only you can read.

Three buckets

With this system in place, every finding on a new review falls into one of three categories. New issues — things we haven't flagged before — get posted as fresh comments. Existing issues that are still present stay quiet. The original comment is already there; posting it again would just be noise. And fixed issues — the ones the developer actually addressed — get an automatic "Resolved" reply, and the conversation thread collapses in GitHub. That last category is the most satisfying part to watch. You fix something the bot flagged, push your code, and the thread closes itself. It's a small thing, but it feels like the bot is acknowledging your work instead of just pointing at problems.

The subtle trap

After shipping this system, we noticed something strange. Comments on files that hadn't changed in the latest push were being marked as "fixed" — even though the developer hadn't touched them. The issue was simple once we saw it: if a file wasn't in the new push, it wasn't in the new analysis, so the system assumed the issue was gone. The absence of evidence became evidence of absence. The fix was teaching the system to only evaluate comments on files that actually changed in the latest push. Everything else stays exactly as it is — untouched, unresolved, waiting for the developer to get to it. It sounds obvious in retrospect, but it took a real bug to see it.

85% fewer comments

After shipping this fix, the number of comments on follow-up pushes dropped by about 85%. Most findings were either still present (silenced — the original comment was already there) or fixed (auto-resolved — the thread collapsed). The remaining 15% were genuinely new issues introduced by the latest push. The noise was gone.

Clean minimal desk with laptop and coffee cup in warm morning light streaming through a window
85% fewer comments on follow-up pushes. Just the signal, none of the repetition.

The user who'd sent us that original message? They turned the bot back on the next day. They've been using it on every PR since. That single metric — did the person who gave up come back — matters more to us than any feature checklist.

Accuracy vs. usefulness

This experience taught us something we keep coming back to: accuracy and usefulness are not the same thing. Our reviews were accurate the entire time — every finding was a real issue in real code. But they weren't useful because the delivery was broken. We didn't fix what the bot said. We fixed how it said it. That single change transformed how people felt about the product.

The other lesson was about testing. We'd been testing with single-push PRs — open a PR, get a review, done. That's the demo workflow. The real workflow involves multiple pushes, incremental fixes, back-and-forth with reviewers, and iteration. The noise problem only appeared when we used the product the way our users actually do. We don't test with demos anymore.