Best Practices for Reviewing AI-Generated Code
A practical checklist for reviewing PRs that contain AI-generated code. What to look for, common AI code smells, and how to review effectively.
AI-generated code has a specific texture. Once you've reviewed enough of it, you start to recognize the patterns — the telltale signs that code was generated by Copilot, Cursor, Claude, or ChatGPT rather than written by hand. And more importantly, you learn where the bugs hide.
This isn't about policing whether your teammates used AI. It's about reviewing effectively when the code you're looking at may not reflect deliberate human decisions at every level.
Here's what to look for and how to review AI-generated code well.
The AI code smell catalog
1. Confident incorrectness
AI-generated code rarely looks uncertain. It doesn't leave TODO comments or question marks. It generates clean, well-formatted code that looks like it was written by someone who knew exactly what they were doing.
This is dangerous. Human-written uncertain code — a wonky variable name, a comment saying "I think this handles the edge case" — signals to the reviewer that extra scrutiny is needed. AI-generated code removes those signals.
What to do: Don't let clean formatting lower your guard. Read for correctness, not style. Ask yourself: "If I delete the implementation and just read the function signatures and tests, does this actually do what it should?"
2. Plausible but hallucinated APIs
AI models sometimes reference APIs, methods, or patterns that don't exist — or that exist in a different version of the library than your project uses. The code compiles and might even pass some tests, but it's using a method signature that was deprecated two versions ago, or calling a function with the wrong parameter order.
What to do: For any external library calls, verify the API against your actual dependency versions. Cmd+click into the function definition. If your IDE can't resolve it, the AI hallucinated it.
3. Over-engineering
AI tends to produce more code than necessary. Where a human developer might write a simple if statement, AI generates an abstract factory pattern. Where you need a utility function, AI creates a class hierarchy.
This happens because AI models are trained on a massive corpus that skews toward enterprise patterns and comprehensive error handling. The output is technically correct but inappropriately complex for your context.
What to do: For every abstraction the AI introduces, ask: "Does this codebase need this level of indirection?" If the answer is "not yet," simplify. The best code review comment you can leave on AI-generated code is "this works but it's more complex than it needs to be — can we simplify?"
4. Shallow test coverage
AI is great at generating tests that pass. It's terrible at generating tests that matter. A typical AI testing pattern:
// AI-generated test
test('should process payment', () => {
const result = processPayment({ amount: 100, currency: 'USD' });
expect(result.success).toBe(true);
});This verifies the happy path. It doesn't test: what happens with a negative amount? With an unsupported currency? When the payment provider is down? When the amount exceeds the user's limit?
What to do: When reviewing AI-generated tests, list the edge cases you'd expect and check if they're covered. If the test file is suspiciously clean with no edge case tests, it's a red flag.
5. Context-blind patterns
This is the most subtle and most important code smell. AI doesn't know your system's constraints, conventions, or history. It generates code that's generically correct but contextually wrong.
Examples:
- Using synchronous I/O in an async-first codebase
- Creating a new database connection instead of using the existing pool
- Implementing custom auth logic when your app has an auth middleware
- Using REST conventions in a codebase that uses GraphQL
What to do: Review AI-generated code against your codebase's conventions, not against general best practices. The question isn't "is this good code?" but "is this good code for our system?"
6. Copy-paste coherence issues
When AI generates multiple related functions or components, they often have a copy-paste quality — similar structure, similar variable names, similar comments. Each individual piece looks fine, but together they suggest the AI generated them from the same template rather than understanding the relationship between them.
What to do: Look at the PR holistically, not just file-by-file. Do the pieces fit together coherently? Do abstractions make sense when you consider the full picture? Are there missed opportunities to share code between similar components?
The review checklist
When reviewing a PR that contains AI-generated code, run through these questions:
Approach (5 min)
- Is this the right approach for our system? (Not just "a" right approach)
- Does it align with existing patterns in the codebase?
- Would I have architected this differently? If so, does the difference matter?
Correctness (10-15 min)
- Does the business logic actually handle all the cases it should?
- Are error paths handled correctly? (Not just caught — handled meaningfully)
- Are there race conditions, edge cases, or boundary conditions the AI might have missed?
- Do external API calls use the correct, current API signatures?
Integration (5 min)
- Does this use existing utilities and abstractions? (Or does it reinvent them?)
- Does it follow the codebase's conventions for error handling, logging, and config?
- Will this cause issues with existing code it interacts with?
Tests (5 min)
- Do tests cover meaningful scenarios, not just happy paths?
- Are edge cases tested?
- Would these tests catch a regression if someone changed the implementation?
Simplicity (2 min)
- Is there unnecessary abstraction?
- Can any of this be simplified without losing functionality?
- Is the code proportional to the problem it solves?
How to give feedback on AI-generated code
Reviewing AI-generated code requires a slightly different communication style. The author may not have made the decisions you're questioning — the AI did. This changes the dynamic.
Don't ask "why did you do it this way?" The honest answer might be "the AI generated it and it seemed fine." Instead, frame feedback as alternatives: "Have you considered using our existing PaymentService here instead of the inline implementation? It handles retry logic already."
Explain the context the AI missed. When you flag an issue, explain why it's wrong for your system, not just that it's wrong. "This approach uses a new DB connection per request — we use the connection pool from db/pool.ts to avoid exhausting connections under load."
Focus on what matters. AI-generated code will have style differences from hand-written code. Unless your team has strong conventions and a linter that enforces them, let minor style issues go. Spend your review capital on correctness, architecture, and integration.
Teach, don't gatekeep. If a developer is learning to use AI tools effectively, review feedback is how they learn what to scrutinize in AI output. "The AI missed this edge case" is more helpful than "this is wrong" because it teaches the developer what to check next time.
The bigger picture
AI-generated code isn't going away. The volume will increase. The quality will improve — but slowly, and never to the point where human review becomes unnecessary. The teams that build strong review practices now will have a lasting advantage.
The bottleneck isn't the review itself. It's everything around it: knowing which PRs need attention, getting notifications to the right reviewer, tracking what's been reviewed and what's stalling. Build the review skill. Let tooling handle the plumbing.