Blog

How to Run an AI Skills Assessment for Your Operations Team

A practical AI skills assessment any COO can run in one week: four dimensions, 20-30 minutes per person, and a development plan for each capability level.

Phos Team ·
Operations AI Strategy Hiring

A team member who opens Claude fifteen times per week but uses it exclusively to ask it to “make this email shorter” is not an AI-capable operations professional.

A team member who opens it three times per week to draft the compliance report from data exports, review the output against their professional standard, and adjust the inputs where needed, is the AI-capable operations professional.

Volume and capability are different measurements. The skills assessment measures the second.

This article describes a practical AI skills assessment that a COO or operations director can run with their team in one week: no external assessment tool, no formal evaluation framework, no certification required.

The assessment produces a specific picture of where each team member stands on four AI skill dimensions, and a specific training plan for each skill gap identified.


The Four Skill Dimensions — What the Assessment Measures and Why

Self-reported AI confidence is not a reliable skill signal. The team member who says they are confident with AI and the team member who is actually capable with AI are not the same population.

Each dimension below has specific, observable indicators. No survey required.


Dimension 1: Input Quality

What it measures: can the team member structure an AI input that produces a useful output without extensive editing?

Why it matters: the single most common capability gap in non-technical teams. Team members who use AI but get consistently mediocre outputs are almost always providing inputs that are too general, too brief, or missing the context that the AI needs to produce a specific, useful output.

The indicators:

LevelWhat the input looks like
High-capabilitySpecific about context (who the output is for, what it needs to accomplish, relevant background); includes format, tone, and length constraints
DevelopingPartially specific — describes the task but not the context, or includes context but not the constraints
FoundationalGeneric task requests (“write an email about the delay”) without context, specificity, or constraints

Dimension 2: Output Evaluation

What it measures: can the team member assess whether an AI output meets the quality standard, and specifically identify what is missing or wrong?

Why it matters: the team member who accepts AI output without evaluation is producing a risk for the organisation: the generic, inaccurate, or tone-inappropriate output that reaches the client, the funder, or the regulator uncorrected.

The indicators:

LevelWhat the evaluation looks like
High-capabilityIdentifies specific elements that do not meet the standard (incorrect technical term, tone inconsistency, missing key point) and can describe what the correct version would look like
DevelopingEvaluates at the overall level (“this seems okay” or “this doesn’t sound like us”) but cannot identify specific elements or describe specific corrections
FoundationalAccepts or rejects the output holistically — uses it without review or discards it without understanding why it was inadequate

Dimension 3: Workflow Identification

What it measures: can the team member identify tasks beyond the trained workflow set where AI would be appropriate?

Why it matters: a team that can only use AI on the workflows they were explicitly trained on is a compliance team. A team that can identify new appropriate AI tasks is moving toward fluency: the self-expanding AI capability that produces compounding returns.

The distinction between AI fluency and AI compliance matters here — workflow identification is one of the clearest signals of which side of that line a team member is on. The maturity context also matters: what level of AI maturity your team is at explains how the team’s overall AI maturity affects what each assessment dimension reveals.

The indicators:

LevelWhat the identification looks like
High-capabilityHas identified and attempted at least one workflow beyond the trained set; can describe the reasoning for why that task was AI-appropriate (structured inputs, defined output, catchable errors)
DevelopingCan describe tasks beyond the trained set they have considered for AI use, but has not attempted them
FoundationalCannot identify tasks beyond the trained set, or describes tasks that are not AI-appropriate (high-judgment, relationship-dependent, or safety-critical tasks)

Dimension 4: Improvement Loop

What it measures: when the AI output is not adequate, does the team member adjust their approach and run the workflow again?

Why it matters: the improvement loop is what distinguishes fluency from compliance. The team member who adjusts and iterates is developing judgment about what AI needs to produce good outputs.

The indicators:

LevelWhat the loop looks like
High-capabilityCan describe a specific situation where the first AI output was not adequate, what they changed in the input or approach, and what the second output looked like
DevelopingHas experienced inadequate outputs but typically resolves them by editing the output manually rather than adjusting the AI approach
FoundationalDoes not adjust the AI approach when outputs are inadequate — accepts, discards, or reverts to manual methods

The Assessment Session — How to Run It in 20 to 30 Minutes per Person

Preparation (Before the Session)

Identify two tasks for the session:

  • Task A: a task the team member has used AI for before (to assess input quality and output evaluation)
  • Task B: a task they have not used AI for before, from their actual work queue (to probe workflow identification)

Prepare two questions:

  1. “Describe the last time an AI output was not quite what you needed. What did you do?”
  2. “Is there a task you have been thinking about trying AI on that you haven’t tried yet? Why haven’t you tried it?”

The Session Structure

Part 1: Live Workflow Observation (10 to 15 Minutes)

Ask the team member to complete Task A using AI, with you observing. Do not assist. Do not coach.

Observe specifically:

  • How does the team member structure the input? (Dimension 1)
  • How do they review the output? (Dimension 2)
  • If the output is not adequate, what do they do? (Dimension 4)

Take brief notes. Do not score in real time. Review after the session.

Part 2: Workflow Identification and Improvement Loop Questions (5 Minutes)

Ask the two prepared questions. Listen for:

  • Specific examples of output adjustment and iteration (high-capability on Dimension 4)
  • Vague answers like “I just edited it” (developing on Dimension 4)
  • Tasks beyond the trained set the team member has considered (Dimension 3)
  • Tasks that are not AI-appropriate (foundational on Dimension 3)

Part 3: Close the Session (5 Minutes)

Thank the team member and close. Do not share assessment scores during the session. Share them in a brief follow-up conversation with the development plan attached.


Scoring (After the Session)

Rate each of the four dimensions:

ScoreLevel
3High-capability
2Developing
1Foundational

Overall capability category:

Total scoreCategory
10 to 12High-capability
6 to 9Developing
4 to 5Foundational

The Development Plan (Shared Within 48 Hours)

The development plan converts assessment scores into a specific training sequence. For the full programme structure that supports each capability category, how to train a non-technical team on AI maps the training investment required at each level.

High-capability (score 10 to 12):

“You are one of the team’s strongest AI users. Your next step is to own [specific new workflow] and to be the person your colleagues ask when they are trying to improve their approach on [the workflow area where you are strongest]. I would like you to run two peer sessions in the next 30 days.”

Developing (score 6 to 9):

“You are using the trained workflows well. Your next step is the improvement loop: the practice of adjusting your AI input when the output is not right rather than editing the output manually. Your next session will focus on this specifically. I will also suggest one new workflow to try in the next two weeks.”

Foundational (score 4 to 5):

“Let’s run a second anchor workflow session using [specific task]. This session will focus specifically on input structure: what information to include to get useful outputs. I will be there for the session.”


What the Assessment Finds — and What to Do With the Findings

The Typical Distribution at 90 Days Post-Training

CategoryTypical sharePrimary need
High-capability20 to 30%Ownership of new workflows; peer teaching role
Developing40 to 50%Improvement loop practice; one new workflow to expand into
Foundational20 to 30%Second anchor session; input quality coaching

The developing group is the highest-leverage investment. They have the Foundation and the trained workflows. They need the improvement loop and the workflow expansion support to move to high-capability.


Three Team-Level Actions That Follow the Assessment

Action 1: Activate High-Capability Team Members as Peer Teachers

Assign each high-capability team member one developing colleague to support:

  • A ten-minute check-in once per week
  • A shared workspace where they can review each other’s outputs
  • Explicit acknowledgment that this is a valued role, not additional work

The high-capability team member who teaches is reinforcing their own fluency through the act of teaching, and producing peer advocacy that is more effective than any trainer at changing colleague behaviour.

Action 2: Design the Improvement Loop Practice Session for Developing Team Members

A 30-minute session specifically focused on Dimension 4.

The AI system owner provides three examples of AI outputs that are not quite right and asks the developing team member to:

  1. Diagnose what is missing
  2. Adjust the input
  3. Evaluate the new output

This is a skill-building session, not a task completion session. The goal is to build the adjustment reflex.

Action 3: Schedule Second Anchor Workflow Sessions for Foundational Team Members

Use a different task from the first session, selected based on the assessment finding:

  • If input quality was the gap: select a task with highly structured inputs (a batch of similar communications rather than a complex one-off document)
  • If output evaluation was the gap: select a task where the quality standard is easy to verify (a document with a clear format requirement rather than a judgment-intensive creative task)

Common Questions on the AI Skills Assessment

”Should I share individual assessment scores with the whole team?”

No. Share scores individually and privately. The development plan is the output that matters, and it is specific to each person.

The team-level communication: share the overall distribution (“we have about a third of the team at high-capability, half developing, and the rest building foundations”) and the team-level actions without naming individuals. This frames the assessment as a development investment, not a performance evaluation.

”What if the highest-capability team member is not the person I expected?”

This is the most common and most valuable assessment finding. The technically enthusiastic person and the AI-capable person are not always the same.

The capability indicators in this assessment (improvement loop, workflow identification, specific output evaluation) often reveal unexpected high-capability team members.

Act on the finding: assign them the peer teacher role regardless of seniority or prior reputation. The respected frontline team member who is genuinely AI-capable is the most effective peer advocate in the organisation.

”What if the assessment reveals that the AI system owner is foundational?”

Address it directly. The AI system owner who is not AI-capable cannot run the improvement loop, cannot maintain the context pack effectively, and cannot support foundational team members in their development.

The immediate action: pair the AI system owner with a high-capability team member for a structured 30-day mentoring period. Reassess at the end of 30 days. If the capability gap persists, consider reassigning the AI system owner role to a team member the assessment identified as high-capability.

”How often should the assessment be re-run?”

Once at 90 days post-training (the diagnostic described in this article) and once at 12 months (the annual review).

Between assessments: use the weekly team meeting AI output question (“what AI output this week was the most useful? What needed the most adjustment?”) as a lightweight continuous signal.


Want the AI Skills Assessment Run for Your Operations Team — and Development Plans Produced for Each Capability Category — Within the Next Two Weeks?

The AI skills assessment measures what usage logs cannot:

  • Input quality
  • Output evaluation
  • Workflow identification
  • The improvement loop

It takes 20 to 30 minutes per team member and produces a specific development plan for each of the three capability categories.

The assessment turns “how is the AI implementation going?” from a question answered by impressions into one answered by evidence.

Path one: run the assessment this week. Pick one team member in each role. Run the 30-minute session using the format above. Score the four dimensions after the session. Produce the development plan from the templates. Share within 48 hours.

Path two: bring in a partner. Phos AI Labs runs the AI skills assessment and produces the development plans and targeted follow-up sessions for each capability category. Thirty minutes, no deck. Start here.

Related articles

The fastest way to know whether we're the right fit, is a conversation.

STEP 1/2 · ABOUT YOU