Blog

Claude Code in CI/CD Pipelines

How to integrate Claude Code into CI/CD pipelines using headless mode, covering PR review, test generation, changelogs, and security scanning.

Phos Team ·
claude code

Claude Code in CI/CD Pipelines

CI/CD pipelines automate the mechanical parts of software delivery. Claude Code running in headless mode extends that automation into work that previously required a human: reviewing pull requests, generating test cases, writing changelogs, and flagging security issues.

The key word is “headless.” Interactive Claude Code requires a developer at the keyboard.

Headless mode for CI/CD runs Claude Code non-interactively, reads outputs programmatically, and fits naturally into the pipeline steps that already exist in your delivery workflow.


What Headless Mode Enables

Headless mode is invoked with the --print flag, which tells Claude Code to produce output and exit rather than open an interactive session. Combined with --output-format json, the output becomes parseable by downstream pipeline steps.

This makes Claude Code a pipeline tool rather than a developer tool. The pipeline calls Claude Code, Claude Code performs a task on the code it receives, and the pipeline reads the result and acts on it.

The shift from interactive to headless is the shift from a productivity tool to an infrastructure component. The same model capabilities are available; the usage pattern is fundamentally different.

Four automation patterns emerge consistently when teams first integrate Claude Code into their pipelines. Each has a natural home in the delivery workflow.


The 4 Automation Patterns

Pattern 1: Automated PR Review

Claude Code reads a pull request diff, applies a review lens (style, error handling, obvious bugs, test coverage), and posts structured feedback as a PR comment. For teams using GitHub Actions specifically, the GitHub Actions integration guide covers the official claude-code-action and workflow templates in detail.

This runs as a step triggered on pull_request events. The review happens before a human reviewer sees the PR, reducing the time human reviewers spend on mechanical issues.

Human reviewers then focus on architecture, business logic, and design decisions that require context the model does not have.

Invocation pattern: claude --print --output-format json "Review this diff..." < diff.txt

Pattern 2: Test Generation

Claude Code reads newly added or modified functions and generates unit test cases for them. The generated tests are committed to a branch or posted as a PR comment for developer review before merging.

This pattern works best for pure functions with clear inputs and outputs. It is less effective for code with heavy side effects or external dependencies.

The generated tests still require human review before they enter the test suite.

Pattern 3: Changelog Generation

Claude Code reads the commit messages and diff between two tags or commits and generates a structured changelog entry. The output follows a template specified in the prompt: user-facing changes grouped by feature area, with deprecation notices flagged separately.

This eliminates the manual work of assembling changelogs before release. The output requires a quick human review but rarely needs significant editing when commit messages are well-written.

Invocation pattern: claude --print --output-format text "Generate changelog from these commits..." < commits.txt

Pattern 4: Security Scanning

Claude Code scans diffs or specific files for common security patterns:

  • Hardcoded credentials, API keys, passwords, tokens committed to source
  • SQL injection vectors, unsanitized user input passed to query strings
  • Missing input validation, external data used without sanitization
  • Insecure deserialization patterns, unsafe object hydration from user-controlled data

Results are structured by severity and posted as annotations or PR comments. This does not replace a dedicated security scanner, it adds a language-model layer that catches patterns static analysis tools miss, particularly in business logic.

A finding from this step should be reviewed by a developer before the PR is blocked. For teams looking to build out dedicated automated code review workflows, the automated code reviews guide covers that in more depth, and the automated testing guide covers the test generation side.


Setting Up Claude Code in GitHub Actions

The core setup requires three things: the ANTHROPIC_API_KEY stored as a repository secret, a workflow file that invokes Claude Code, and a prompt that specifies exactly what to do with the code it receives.

A basic PR review workflow looks like this:

name: Claude Code PR Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      contents: read

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code

      - name: Run PR Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git diff origin/${{ github.base_ref }}...HEAD > diff.txt
          claude --print --output-format json \
            "Review this diff for style issues, missing error handling, and obvious bugs. Format output as markdown with severity labels." \
            < diff.txt > review.json

      - name: Post Review Comment
        uses: actions/github-script@v7
        with:
          script: |
            const review = require('./review.json');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: review.content
            });

This is the minimal version. Production implementations add caching, error handling for API failures, and rate limiting logic for high-volume repositories.

Key flags used:

  • --print, non-interactive mode, prints output and exits
  • --output-format json, structured output parseable by downstream steps
  • ANTHROPIC_API_KEY, referenced from repository secrets via ${{ secrets.ANTHROPIC_API_KEY }}

Cost Controls for Automated Runs

Automated pipeline runs can generate significant API costs if left unconstrained. A repository with 50 PRs per week, each triggering a review that processes a 500-line diff, can accumulate meaningful token usage quickly.

Three controls reduce cost without sacrificing utility:

  • Diff size limits. Skip the Claude Code step for PRs where the diff exceeds a threshold (e.g., 2,000 lines). Very large PRs are often better reviewed by splitting them into smaller units anyway.

  • File type filtering. Only run Claude Code on changed files of relevant types. A PR that only changes documentation or *.yaml configuration files may not benefit from a code review step.

  • Model tier selection via --model. Use a lighter model tier for high-frequency, lower-stakes tasks like changelog generation. Reserve the full model for security scanning and PR review where output quality matters most.

Set up a monthly spend alert on your Anthropic API account. Unexpected spikes in automated pipeline usage are usually caused by a single high-volume repository or a workflow misconfiguration that causes retries.


What to Automate vs. What to Keep Human

Not everything that Claude Code can do in a pipeline should be automated without human gates. The distinction matters for both quality and organizational trust in the output.

TaskAutomateKeep Human
Style and formatting feedbackYes, post as PR commentHuman reviews, not blocks
Missing error handlingFlag automaticallyHuman confirms before blocking merge
Changelog draftingYes, generate automaticallyHuman edits before publishing
Security flag: hardcoded credentialYes, block merge automaticallyHuman confirms it’s not a false positive
Security flag: logic vulnerabilityFlag automaticallyHuman required before any action
Test generationYes, generate and postHuman reviews before committing to suite
Architecture decisionsNeverAlways human
Business logic reviewNeverAlways human
Breaking change assessmentFlag patternsHuman confirms scope and impact

The general principle: use automation to surface issues and drafts. Use humans to confirm, decide, and take action on anything with meaningful consequences. Teams looking to extend this beyond code review into broader workflow automation should explore AI-Native Operations, which applies the same automation-first thinking across the full development and delivery workflow.


Common Questions on Claude Code in CI/CD

Does running Claude Code in CI/CD require a specific Anthropic plan?

Claude Code in headless mode uses the Anthropic API directly, billed per token. Any plan with API access works.

Enterprise teams should confirm their API agreement covers automated pipeline usage, particularly if the code being reviewed contains sensitive IP.

How do we prevent Claude Code from taking destructive actions in the pipeline?

In CI/CD, Claude Code should run in a mode that produces output rather than executing actions. The --print flag combined with --disallow-tools bash prevents the model from running shell commands during automated review steps.

Treat the output as a recommendation, not an instruction.

What happens when the Anthropic API is unavailable during a pipeline run?

Build in a fallback: if the Claude Code step fails or times out, the pipeline should continue without blocking the PR. A missing AI review is an inconvenience.

A broken pipeline that blocks all merges is an incident. Make the Claude Code step non-blocking by default, use continue-on-error: true in the workflow step.

How do we measure whether the automated reviews are actually useful?

Track how often developers act on Claude Code comments (accept, edit, or explicitly dismiss with a reason). Low action rates suggest the prompts need refinement or the review is flagging too many low-signal issues.

A useful benchmark: if developers are dismissing more than 60% of automated comments, the prompt needs work.


Making CI/CD Automation Work in Practice

The patterns above work. Teams that implement them reduce the mechanical portion of code review and recover time that human reviewers spend on issues a model can catch reliably.

The discipline is in the constraints: clear prompts, cost limits, non-blocking pipeline steps, and human gates on anything that matters.

Path one: build it yourself. The workflow YAML above is the starting point. Add the cost controls, define your prompt for each pattern, and run the pilot on a single repository before expanding. The first two weeks will surface the prompt refinements you need.

Path two: work with Phos AI Labs. If you want the automation patterns designed for your specific workflow, the prompts calibrated against your codebase, and the cost model validated before rollout, that is implementation work we do with development teams. Start the conversation here.

Related articles

The fastest way to know whether we're the right fit, is a conversation.

STEP 1/2 · ABOUT YOU