Blog

How to Know If Your Engineers Are Using AI Well

How to spot whether engineers are using AI as a thinking partner or just accepting bad output: four observable signals and a 30-minute quality audit for any team.

Phos Team ·

How to Know If Your Engineers Are Using AI Well

You can tell if your engineers are using AI. The question you cannot answer yet is whether they are using it well.

“We use Claude for code review” tells you about adoption. It tells you nothing about whether the code review is getting better, worse, or just faster at the same mediocre standard it was before.

The most dangerous pattern in AI-assisted engineering is not the engineer who refuses to use AI. It is the engineer who uses it constantly and ships everything it produces without reading it carefully. The output looks done. The review passes. The bug ships. The architecture drifts.

The problem is not adoption. It is the absence of a quality bar that anyone can name.


The Pattern Nobody Is Talking About: Confident Mediocrity at Speed

There are two failure modes in AI-assisted engineering.

The first is obvious: engineers who do not use AI at all and are falling behind. Most engineering leaders have addressed this one.

The second is subtler and more dangerous: engineers who use AI constantly but have gradually lowered their quality bar to match what the AI produces by default. They are shipping faster. The output passes review. But the architecture is muddier, the edge cases are underhandled, and the documentation says what the code does rather than why it does it.

What confident mediocrity looks like in practice:

  • The PR description is clearly AI-generated and says what the code does; not what problem it solves or what tradeoffs were made
  • The code review comment from the AI is accepted without pushback, even when it misses the context of why the code was written that way
  • The test coverage number looks fine but the tests cover the obvious path; not the failure modes
  • When asked “what would break if this assumption changed?”, the engineer pauses longer than they should

None of these are firing offences. All of them are signals that AI is being used as a shortcut rather than a thinking partner.


The Difference Between AI as a Shortcut and AI as a Thinking Partner

AI as a shortcutAI as a thinking partner
”Generate a function that does X” — accept and ship”Generate a function that does X” — read it, identify the tradeoffs, consider the edge cases, decide whether the approach is right for this context
Accepts every code review suggestionEvaluates each suggestion: is the AI right here? Does it understand the context? Pushes back when it does not
Uses AI output as the answerUses AI output as the first draft of the answer
Cannot explain why the code works the way it doesCan explain the tradeoffs, the edge cases, and why this approach over alternatives
Test coverage added by AI to hit the numberTest coverage designed to catch real failure modes; AI helps write the tests once the failure modes are identified

The core distinction:

An engineer using AI as a thinking partner is doing more thinking, not less. They are using AI to generate options faster, catch things they might have missed, and reduce the mechanical work of writing boilerplate.

But the judgment — what the right approach is, what would break this, what the edge cases are — remains theirs.

An engineer using AI as a shortcut is outsourcing judgment to a model that has no context about the codebase, the business requirements, or the specific constraints of the situation.

The output looks like engineering. The thinking is absent.


Four Observable Signals That Reveal Which Pattern You Have

These four signals are visible in the artifacts that already exist in an engineering workflow; PRs, code review comments, commit messages, documentation, and test suites.

No new monitoring required.

Signal 1: PR Description Quality

What to look for: does the PR description explain what the code does, or what problem it solves, what tradeoffs were made, and what was considered and rejected?

AI-generated PR descriptions typically cover the first. A thoughtful engineer covers all three.

The test: could a senior engineer read this PR description and understand why these choices were made, without reading the code? If not, the description is a transcript of what the AI produced; not a communication of engineering judgment.

Signal 2: Code Review Response Patterns

What to look for: when the engineer receives a code review comment (from a human or an AI assistant), do they engage with it or accept it?

Engagement looks like:

  • “I considered that approach but chose this one because of X constraint”
  • “You’re right, I missed that edge case — here’s the fix and here’s why it works”

Acceptance looks like:

  • Applying the suggestion without comment

A team that accepts every code review suggestion without pushback has either outsourced their judgment to the reviewer; or more commonly, has applied AI-generated review suggestions without evaluating whether they are correct for this specific context.

Signal 3: Test Design vs. Test Coverage

What to look for: are the tests testing the obvious path, or the failure modes?

AI is very good at generating tests for the happy path. AI is less reliable at identifying the failure modes worth testing:

  • What happens when the assumption changes?
  • What happens at the edge of the range?
  • What happens when the upstream dependency behaves unexpectedly?

A test suite that hits 85% coverage but tests only the scenarios the AI found obvious is not an 85% covered codebase. It is a codebase where the 15% not covered contains all the interesting failure modes.

Signal 4: The “Explain It Back” Conversation

What to look for: when an engineer walks through a piece of their own AI-assisted code, can they explain the tradeoffs; not just what it does?

This is the fastest single diagnostic. It does not require a formal review. It can happen in a standup, a 1:1, or a design review.

Ask: “What would happen if [specific assumption] changed?” or “Why did you approach it this way rather than [alternative]?”

  • The engineer who used AI as a thinking partner has an answer. They considered alternatives and have a reason for the choice.
  • The engineer who accepted the AI’s output has to re-derive the answer on the spot; and sometimes cannot.

How to Run the Quality Audit: A Structured 30-Minute Review

This audit is not a code review. It is a quality signal review. It takes 30 minutes per engineer and produces a clear read on whether each person is using AI as a thinking partner or a shortcut.

The audit structure:

Pick one PR from the last two weeks for each engineer. Ideally one that was AI-assisted. Review it against four questions:

QuestionThinking partner signalShortcut signal
Does the PR description explain tradeoffs?Yes; specific choices explained with reasoningDescribes what it does; not why
Are code review responses engaged or passive?Engaged; explains reasoning or pushes backPassive; accepted without comment
Do the tests cover failure modes?Yes; tests for edge cases and failure scenariosTests the happy path only
Can the engineer explain the approach in 5 minutes?Yes; can articulate tradeoffs and alternativesRe-derives or defers to “the AI suggested it”

Score each signal: 1 (shortcut pattern), 2 (mixed), 3 (thinking partner). Total out of 12.

ScoreWhat it means
10–12Using AI as a thinking partner. Risk is complacency; keep the quality bar visible.
7–9Mixed pattern. Specific signals are degraded. Worth a direct conversation.
4–6Shortcut pattern dominant. Needs a clearer quality bar; not a reprimand.
Below 4Not an AI quality problem; this is an engineering fundamentals conversation.

The Conversation That Fixes It

The conversation should never start with “you’re accepting bad AI output.” It should start with the quality bar; what good looks like; and work backward to where the current work falls short of it.

What to say:

“I want to talk about what great AI-assisted engineering looks like on this team, because I don’t think we’ve been explicit enough about it. Great AI-assisted code is code where you’ve evaluated what the model produced, made deliberate choices about the tradeoffs, and can explain those choices to anyone who reads the code later. I want to understand what your process looks like when you’re using AI for [specific workflow] — walk me through it.”

Then listen. The engineer’s description of their process will reveal whether judgment is present or absent.

What not to say:

  • “You’re just copy-pasting from Claude and shipping it” — accusatory and usually inaccurate; the problem is typically gradual drift, not conscious laziness
  • “We need more code review of AI output” — adds friction without adding judgment; the problem is not catching AI errors after the fact, it is developing the habit of evaluating AI output before accepting it
  • “Stop using AI for this” — wrong corrective; the goal is better AI use, not less AI use

The framing that works:

The quality bar for AI-assisted engineering is the same as the quality bar for all engineering.

The code should be correct. The failure modes should be understood. The choices should be explainable.

AI changes the speed at which engineers can reach that bar. It does not lower the bar.


The Quality Bar Document: What to Build So This Conversation Happens Once

The single most effective intervention for the confident-mediocrity pattern is not monitoring, not more code review, and not restricting AI use.

It is making the quality bar explicit in writing.

So every engineer knows what “using AI well” looks like on this team before they submit a PR.

What the quality bar document contains:

Section 1 — What AI-assisted code review looks like here

Three to five specific examples of acceptable and unacceptable behaviour.

  • Unacceptable: “Accepted without comment”
  • Acceptable: “Agreed with this suggestion because X” or “Declined this suggestion because Y”

Section 2 — What good PR descriptions contain

The template for a PR description on this team:

Problem: what issue does this solve?
Approaches considered: what alternatives were evaluated?
Tradeoffs made: why this approach over others?
Known limitations: what edge cases exist?
Tests written: what failure modes do the tests cover and why?

Section 3 — Test design standard

The explicit expectation: tests should cover the happy path and the three most likely failure modes.

AI can be used to write the tests once those failure modes are identified. Using AI to identify which failure modes to test is acceptable as a starting point; the engineer signs off on the final list.

Section 4 — The “explain it back” standard

Every engineer should be able to walk a colleague through any piece of their AI-assisted code; what it does, why it does it that way, and what would break it; in under five minutes.

This document is introduced in onboarding and referenced in code review. When a PR falls short of the standard, the conversation references the document; not the manager’s personal judgment.


Common Questions on Managing AI Quality in Engineering

”What if my engineers resist being asked to explain their AI-assisted code?”

Resistance usually signals anxiety; not incompetence. The engineer who cannot explain their code is not trying to hide poor work; they are often genuinely unsure of the decisions the AI made on their behalf.

Reframe the expectation: “I want to make sure we all understand the code we ship, regardless of how it was produced.” That is a quality standard everyone can sign up to; not a performance threat.

”How do I distinguish an AI shortcut from an engineer who is just fast?”

Ask the engineer to walk through the code. The fast engineer who used AI as a thinking partner has a fluent answer. The engineer who accepted AI output without evaluation stumbles when the question is specific.

Speed is not the signal. The ability to explain the choices is.

”Should we restrict which AI tools engineers can use?”

No; with one exception. If a tool sends code to an external service in a way that violates data governance requirements, restrict that specific tool for that specific data type.

Otherwise: restrict by quality standard, not by tool. The quality bar document applies regardless of which AI the engineer used.

”How do I handle a senior engineer whose quality has degraded?”

Have the quality bar conversation privately and specifically. Reference the signals you have observed. Invite their perspective on what their process looks like.

Senior engineers often respond well to the framing: “You have always had a high quality bar. I want to make sure the way we’re using AI here matches that standard."

"Does this framework apply to architecture decisions and system design, not just code?”

Yes; with adjusted signals. For architecture decisions, the “explain it back” test is: can the engineer articulate why this architecture over the alternatives, what the failure modes are, and what would need to change if the business requirements shifted?

AI is increasingly used to generate architectural options. The judgment layer; evaluating those options against the specific context; remains human.

”What is the right ratio of AI-assisted to manual code?”

This is not the right metric. The right metric is output quality and the engineer’s ability to explain and own the code; regardless of how it was produced.

A codebase where 80% was AI-assisted and all of it is correct, well-tested, and explainable is better than one where 20% was AI-assisted and none of it is understood by the person who wrote it.


Want to Build the Quality Standard That Makes AI Use Compound Across Your Engineering Team?

The team that uses AI well is not the team that uses it most. It is the team that holds the quality bar regardless of how the code was produced; and has made that bar explicit enough that every engineer knows what it means to meet it.

Path one: write the quality bar document this week. Draft the four sections above for your team. Introduce it in the next code review session. The conversation it enables is worth more than another round of tooling decisions.

Path two: bring in a partner. If you want the quality standard, the workflow documentation, and the team training built together as a system that compounds; that is the work Phos AI Labs does across Phases 1 and 2. The fastest way to know if it is the right fit is a conversation. Thirty minutes, no deck.

The fastest way to know whether we're the right fit, is a conversation.

STEP 1/2 · ABOUT YOU