Blog

Generative AI Capabilities: What It Can and Cannot Do Today

A practical assessment of generative AI capabilities for business teams: where it excels, where it struggles, and what this means for which workflows to automate first.

Phos Team ·
AI Strategy

The business value of generative AI depends on matching the right tasks to the AI’s actual capabilities, not its theoretical potential. This article gives you a practical assessment of where AI performs reliably, where it does not, and how to use that information to sequence your deployments.


What generative AI does reliably well

Long-form writing and editing

Generative AI produces coherent, well-structured first drafts for reports, proposals, email campaigns, documentation, and articles. The quality is high enough that editing the AI draft is significantly faster than writing from scratch, for most professional writing tasks.

The specific capabilities here include: maintaining a consistent argument across a multi-page document, adapting to a specified tone or voice, following formatting requirements, and generating multiple variations of the same content for testing or personalization.

Summarization and extraction

Summarizing long documents, extracting key points, and identifying relevant information from large text sets are among the most reliable AI capabilities. AI can process a 50-page contract and produce a one-page summary of key terms, or scan 100 customer feedback responses and extract the most common themes.

The accuracy of summarization is high when the AI is working from provided text (rather than from memory). The risk is selective summarization: AI may omit information that the reader would consider important. Human review of summaries for completeness is good practice.

Question answering from provided documents

When you give AI a document and ask specific questions about its content, the accuracy is high. This capability supports legal review, contract analysis, research synthesis, and technical documentation review at a level of reliability that makes it practically useful without constant fact-checking.

The important caveat: the AI answers from the document you provide. If the relevant information is not in the document, the AI will not reliably say “I don’t know.” It may generate a plausible-sounding answer from its training data. For factual queries, verify answers against the source document.

Code generation and review

For common programming languages and well-defined functionality, AI code generation is reliable enough to measurably accelerate development. AI can write functions from specifications, generate unit tests, produce documentation, and identify common code quality issues.


Where it struggles

Precise numerical computation

AI can set up calculations, interpret numbers, and generate analysis narratives, but it makes arithmetic errors at a rate that makes it unreliable for precise calculation without verification. For any workflow where exact numbers matter, the AI should produce the setup and the interpretation, while the actual computation runs through a dedicated calculation tool.

Current events and real-time information

AI training data has a cutoff. The model does not know what happened after that cutoff unless given current information in the prompt. For any analysis that requires current data, provide the data in the prompt rather than asking the AI to recall it.

Novel or highly specialized domain knowledge

AI trained on general text performs less well on highly specialized domains with limited training data. For niche technical domains, emerging regulatory frameworks, or highly specialized professional knowledge, AI outputs require more careful expert review.

Complex multi-step reasoning with interdependencies

AI can reason through complex problems, but it sometimes loses track of earlier constraints in multi-step reasoning chains, leading to internally inconsistent conclusions. For complex analytical work, breaking the task into sequential steps and reviewing each intermediate output is more reliable than asking for a single comprehensive answer.


Hallucinations: what they are and how to manage them

A hallucination is when AI generates a plausible-sounding but factually incorrect statement. This is not a bug. It is a consequence of how language models work: they predict what text is most likely to follow, and sometimes the most statistically likely text is factually wrong.

Hallucinations are more common when:

  • the AI is asked about specific facts outside its training data
  • the prompt does not provide the relevant information and expects the AI to recall it
  • the question has a specific correct answer that is not well-represented in training data
  • the AI is generating long-form content with many specific factual claims

The practical mitigation approach:

Provide the source material. When you need factual accuracy, give the AI the relevant documents and ask it to work from them, rather than asking it to recall information.

Verify specific claims. Any AI output that makes specific factual claims, including statistics, citations, dates, or numerical data, should be verified before use in client-facing or high-stakes content.

Use AI for structure and prose, verify facts separately. A practical workflow: use AI to draft the structure and prose, then manually verify every specific fact referenced in the draft. This combines AI’s writing efficiency with human verification of accuracy.


Capability table by task type

Task typeAI reliabilityRecommended useHuman review needed
First-draft writingHighPrimary drafterLight editing
Document summarizationHighPrimary summarizerCompleteness check
Q&A from provided docsHighResearch assistantSpot verification
Code generationHigh for common languagesCo-pilotFunctional review
Numerical calculationLowSetup only, not executionAlways verify
Current eventsLow without toolsAvoid or provide dataAlways verify
Complex analysisMediumFirst-pass analysisDeep review
Creative ideationHighBrainstorming partnerSelection and refinement
Legal interpretationMediumResearch supportAttorney review required
Financial projectionLow for calculation, high for narrativeNarrative onlyAll numbers verified

How capabilities are evolving

AI capabilities are improving rapidly. Tasks that required significant human review 18 months ago require less review today. Specific capabilities that have improved materially in 2026 include: reasoning accuracy, longer document handling, and tool use (AI calling external tools like calculators and search engines to compensate for its own limitations).

The direction of travel is toward higher reliability across more task types, with specific tool integrations compensating for areas where the base model is weak. The practical implication: capabilities you test today may be more reliable six months from now. Build measurement into your deployments so you can update your quality review requirements as the underlying capability improves.


Matching capabilities to use cases

The workflow selection framework based on AI capabilities:

Start with summarization and first-draft writing. These are the most reliable capabilities with the highest time savings. Proposals, reports, and client communications are the natural first deployment.

Add Q&A and research support. Once the team is comfortable with first-draft workflows, adding document Q&A and research synthesis expands the value significantly.

Expand to specialized applications carefully. Code generation, legal analysis, and financial narrative require higher-quality context packs and more careful review protocols. Deploy these after the team has built AI quality review skills on simpler tasks.

For the full deployment framework, see AI strategy vs AI implementation.


Frequently asked questions

How do I know if an AI output is accurate enough to use?

For factual claims: verify them against source documents or trusted references before using in high-stakes content. For structure and prose quality: edit for tone, accuracy, and completeness. The standard is: would a knowledgeable human reviewer be comfortable with this output after your review? If yes, the accuracy is sufficient. The data: If the output requires more than 20% to 25% editing to reach that standard, the context and prompting need improvement.

Is AI accuracy improving fast enough to change what I should deploy today?

Yes, but not uniformly. Write today’s workflow design for today’s capabilities, with review steps calibrated to today’s reliability. Build a review of capability changes into your quarterly program cadence. When a specific capability improves enough to reduce review requirements, update the workflow. Do not avoid deployments today because you expect capabilities to improve. The opportunity cost of waiting is real.

What is the most reliable test of AI quality for my specific use cases?

Test on representative real work samples, not hypothetical or simplified examples. Take 10 real examples of the output you need (real proposals, real reports, real client communications), prompt the AI to produce equivalent outputs, and measure editing time and quality against your standards. The question: This is the only reliable way to know how AI performs on your actual tasks.


Ready to match AI capabilities to your workflows?

You now have the capability map: where AI performs reliably, where to be cautious, and how to sequence your deployments. The next step is testing those capabilities against your specific use cases.

Path one: run a capability test on your top three workflows. Take real examples of each workflow output, run them through a leading AI model with good context, and measure editing time. This data tells you exactly where to start.

Path two: work with Phos AI Labs. If you want an experienced partner to assess AI performance on your specific workflows and build the deployment plan, Phos AI Labs is a CCA-F certified Claude implementation partner. Thirty minutes, no deck. Start here.

Related articles

The fastest way to know whether we're the right fit, is a conversation.

STEP 1/2 · ABOUT YOU