Not every workflow is equally ready for AI.
Some are ready today; high frequency, clear inputs, defined outputs, minimal judgment required.
Some are almost ready; one prerequisite away from being AI-appropriate. Some belong permanently in the human layer; too judgment-intensive, too relationship-sensitive, or too high-stakes for the consequences of an AI error to be acceptable.
The fastest path to running AI workflows is identifying which category each workflow sits in before building anything. The slowest path is building in the wrong order and debugging the fallout.
This article gives a specific audit framework; four evaluation dimensions scored against a simple rubric; that produces a prioritised AI workflow list in two hours.
The highest-scoring workflows are the ones to build first. The lowest-scoring ones are the ones to leave human. The middle tier are the ones that need a specific prerequisite before AI can handle them.
The four audit dimensions: what to evaluate for each workflow
Dimension 1: Frequency and volume
What it measures: how often the workflow runs and how many instances occur per week.
Why it matters: AI investment in a workflow produces returns proportional to how often the workflow runs. A workflow that saves 20 minutes per run produces more weekly return at 20 runs per week than at 1 run per week. High-frequency workflows justify the build investment quickly; low-frequency workflows may not justify it at all.
| Frequency | Score |
|---|---|
| Daily or multiple times per day | 3 |
| 3–5 times per week | 2 |
| Weekly | 1 |
| Monthly or less | 0 |
The volume modifier: a workflow that runs once per week but processes 50 items per run scores the same as a workflow that runs daily. The total items processed per week is the relevant measure when individual runs process batches.
Dimension 2: Structure and predictability
What it measures: how consistent the inputs are, how predictable the logic is, and how well-defined the expected output is.
Why it matters: AI performs best on structured, predictable inputs with defined outputs. Unstructured inputs, unpredictable logic, and vague outputs are the characteristics that make AI workflows unreliable.
| Characteristic | Score |
|---|---|
| Inputs are consistent and well-defined; logic is rule-based; output format is specified | 3 |
| Inputs are mostly consistent with some variation; logic is primarily rule-based with occasional exceptions | 2 |
| Inputs vary significantly; logic requires some contextual interpretation; output format is flexible | 1 |
| Inputs are highly variable; logic requires substantial judgment; output is context-dependent | 0 |
The most common finding: workflows score lower on this dimension than expected because the person running them has not recognised the extent to which their personal judgment is filling in for documented rules.
The structured interview from the workflow mapping article surfaces these invisible judgment calls.
Dimension 3: Judgment content
What it measures: how much of the workflow requires human judgment; decisions based on relationship context, situational nuance, or professional accountability; rather than rule-following.
Why it matters: this is the dimension most founders underestimate.
A workflow that appears simple may contain several judgment calls made so automatically that they feel like process steps.
The invoice exception decision, the tone calibration for the client relationship, the determination of whether a situation warrants escalation; these are judgment calls even when they feel routine.
| Judgment content | Score |
|---|---|
| Minimal judgment; outputs are primarily determined by rules and data | 3 |
| Low to moderate judgment; some contextual decisions but primarily rule-based | 2 |
| Moderate judgment; multiple decisions require contextual knowledge or professional discretion | 1 |
| High judgment; the quality of the output depends primarily on the specific human’s knowledge, relationships, or accountability | 0 |
The honest scoring test: “If ten different competent people ran this workflow with the same inputs and instructions, would they produce substantially the same output?”
- If yes: low judgment content
- If they would vary significantly: higher judgment content
Score this dimension honestly. The temptation is to score familiar tasks lower because they feel automatic.
Dimension 4: Consequence of error
What it measures: what happens when the AI produces a wrong output.
This dimension runs on different logic from the other three. A wrong output in a low-consequence workflow is recoverable; a wrong output in a high-consequence workflow has real cost.
Note: high score = low consequence = more AI-appropriate.
| Consequence of error | Score |
|---|---|
| Easily reversible; errors are caught in review; no external impact | 3 |
| Recoverable; errors may cause internal rework but are caught before external impact | 2 |
| Significant; errors have client-visible impact, financial cost, or reputational consequence if not caught | 1 |
| Severe; errors have immediate financial, legal, or relationship-critical consequences | 0 |
The consequence modifier: a workflow scoring 0 on consequence of error is not automatically excluded from AI. It is automatically flagged for a mandatory human checkpoint with an explicit review protocol. The consequence score shapes how the workflow is designed; not just whether AI touches it.
The readiness matrix: scoring and categorising each workflow
Add the scores across all four dimensions. Maximum: 12. Minimum: 0.
| Total score | Classification | What it means |
|---|---|---|
| 9–12 | Build now | High readiness; strong AI candidate; build first |
| 6–8 | Build after prerequisite | Medium readiness; one or two dimensions need work; identify the specific gap and address it before building |
| 3–5 | Build with caution | Low-medium readiness; multiple gaps; AI can assist but full automation requires significant design work |
| 0–2 | Keep human | Low readiness or high consequence; this workflow stays human-operated |
The “build after prerequisite” category: what the gap usually is
The most common finding is a workflow scoring 8; three dimensions at 3 and one at 2, or three at 2 and one at 1. The gap dimension tells the builder what to fix before building:
| Gap dimension | What the prerequisite usually is |
|---|---|
| Low frequency score | Assess whether the volume justifies the build cost; may not |
| Low structure score | Workflow mapping + input standardisation; 1–2 weeks of work |
| Low judgment score | Document the decision rules; add as AI operating rules; 1–2 hours per rule set |
| Low consequence score | Design explicit human checkpoint with approval protocol; 1–2 hours |
Running the audit: a practical two-hour process
What you need before starting
- The workflow inventory (the list of all recurring workflows with frequency and time estimates)
- 30–45 minutes of time from the person who runs each workflow (via interview or self-assessment form)
Step 1: Prepare the scoring sheet (15 minutes)
A simple spreadsheet with columns:
- Workflow name; owner role; frequency (weekly runs); time per run (minutes)
- Total weekly time cost (frequency × time)
- Dimension 1 score; Dimension 2 score; Dimension 3 score; Dimension 4 score
- Total score; classification; prerequisite gap (if applicable)
Step 2: Score each workflow (60–90 minutes)
The most reliable method: the person who runs the workflow completes a self-assessment form for Dimensions 1 and 4 (frequency and consequence are mostly factual). A structured interview covers Dimensions 2 and 3 (structure and judgment require more nuanced assessment).
The structured interview questions:
For structure (Dimension 2): “If I asked you to write a step-by-step instruction guide that anyone could follow for this workflow, how long would it take and how complete would the guide be?”
- Guide takes 30 minutes to write and covers 95% of cases: high structure score (3)
- Guide would take a day and would still have significant judgment gaps: low structure score (0–1)
For judgment (Dimension 3): “If you were out for two weeks and a competent person who had never run this workflow before followed your instruction guide, what would their output quality be compared to yours?”
- Same quality: low judgment content (3)
- Significantly lower quality: high judgment content (0–1)
Step 3: Classify and prioritise (15 minutes)
Sort the scoring sheet by total score (highest to lowest). Apply the classification to each workflow. For every “Build after prerequisite” classification, fill in the prerequisite gap column with the specific work required.
Step 4: Produce the build list (15 minutes)
Build now (first sprint; next 4–6 weeks): The top three to five workflows by total score. Highest frequency, highest structure, lowest judgment, most recoverable consequences.
Build after prerequisite (second sprint; 6–12 weeks): The workflows that score 6–8 with a specific identified gap. List the gap and the prerequisite work for each.
Keep human (ongoing): The workflows that score 0–5 or score 0 on consequence. Document why each one stays human; this list is useful when the question “why aren’t we automating X?” arises.
The most AI-ready workflows in mid-market companies: a reference list
Consistently high-scoring (Build now category for most companies)
| Workflow | Function | Why high-scoring |
|---|---|---|
| Invoice intake, matching, and exception flagging | Finance | High frequency; high structure; low judgment; recoverable errors |
| AR ageing monitoring and draft collections communications | Finance | Daily/weekly; structured; rule-based logic; recoverable |
| Pipeline summary generation | Sales | Weekly; CRM data is structured; rule-based; low consequence |
| Meeting transcript processing and action item extraction | Operations | Per-meeting; structured input; defined output; recoverable |
| Support ticket triage and first-draft response | Support | High frequency; consistent input types; rule-based; recoverable |
| Expense report coding and anomaly flagging | Finance | Weekly; structured; rule-based; recoverable |
| Weekly client status update draft | Account management | Weekly; structured PM data input; defined format; recoverable |
Consistently medium-scoring (Build after prerequisite for most companies)
| Workflow | Typical gap | Prerequisite |
|---|---|---|
| Client proposal first draft | Variable inputs; requires client archetype context | Complete client archetypes in context pack |
| New lead qualification brief | Judgment on fit | Document qualification criteria |
| HR candidate screening brief | Judgment on fit | Document evaluation criteria |
| Monthly performance narrative | Judgment on how to frame variances | Financial narrative standards documented |
Consistently low-scoring (Keep human for most companies)
- Pricing decisions for non-standard engagements
- Client complaint and dispute response
- Hiring and promotion decisions
- Strategic planning and market positioning
- Partnership negotiation
- Board and investor communications
Common questions on the AI workflow audit
”Can I run this audit myself or do I need an outside facilitator?”
Yes; the scoring rubric above makes each dimension objective enough to score without an outside facilitator.
The risk of self-running: founders tend to underestimate judgment content in workflows they have run hundreds of times themselves; because the judgment has become invisible through repetition.
Have the person who actually runs the workflow score Dimension 3; not the founder.
”What if different team members score the same workflow very differently?”
The disagreement is the insight.
Two team members scoring the same workflow with different structure scores (one says 3; one says 1) means the workflow is not actually standardised; one person has developed a systematic approach and the other is improvising.
Resolve by interviewing both and documenting the difference. The more systematic version is the target; build the AI on that version.
”How do I handle a workflow that scores 9+ but is extremely sensitive politically?”
Score and classify it accurately. The political sensitivity is a separate consideration from the AI readiness score.
Document it as “Build now; with stakeholder alignment required before launch.” The prerequisite is not technical; it is a conversation with the relevant stakeholders about how the automated workflow will be governed, reviewed, and overridden.
”Should I audit every workflow or just the obvious candidates?”
Audit every recurring workflow on the inventory; even the ones that seem obviously human or obviously AI.
The “obviously human” workflows sometimes contain execution components that AI can handle. The “obviously AI” workflows sometimes contain more judgment than they appear to.
The audit removes “obvious” from the decision. Obvious is the source of most wrong build decisions.
”What happens after I have the build list: what is the next step?”
Map the top three to five workflows on the “build now” list using the five-component mapping framework (trigger, inputs, decision points, human checkpoints, expected outputs). The map is the specification the AI is built from.
Do not start configuring AI before the map is complete. The workflow audit tells you what to build. The workflow map tells you how to build it.
”How do I handle workflows that are about to change significantly?”
Do not automate them yet. A workflow that is being significantly redesigned in the next 60 days is a poor investment for AI configuration now; the automation will be wrong on day one.
Map the intended future version of the workflow alongside the current version. Build the AI on the intended version after the operational change is stable and has been running consistently for 30+ days.
Want the workflow audit run on your specific business: and the build list sequenced for the fastest stable results?
The AI workflow audit takes two hours and produces a prioritised build list that prevents six months of building in the wrong order.
The four dimensions; frequency, structure, judgment, and consequence; provide a systematic framework for what should be intuitive but rarely is.
The workflows that consistently score highest are the ones most founders do not build first: the operational data workflows that run every day, require no judgment, and have recoverable errors. Building these first produces three stable, compounding automation wins within six weeks.
Path one: run the scoring sheet this week. List every recurring workflow in the business. Score each against the four dimensions using the rubric above. Sort by total score. The top three workflows on that list are your immediate build candidates.
Path two: bring in a partner. If you want the workflow audit run with the depth that comes from having done it forty times before; and the build list sequenced for the fastest stable results; that is the Phase 1 work Phos AI Labs does. We have run 400+ AI engagements. Clients include Zapier, Coca-Cola, Medtronic, Dataiku, and American Express. Thirty minutes, no deck. Start here.