KindLM

Regression testing and compliance guardrails for agentic AI workflows.

npm install -g @kindlm/cli
kindlm init
kindlm test kindlm.yaml

What KindLM Does

KindLM is a CLI tool that runs test suites against your LLM-powered features and agents. Define test cases in YAML, run them locally or in CI, and get clear pass/fail results with actionable failure reasons.

What makes it different:

  • Agent-aware — assert on tool calls (which tool was called, with what arguments, in what order), not just text output
  • LLM-as-judge — evaluate subjective quality ("is this response empathetic?") using configurable judge models
  • Compliance reports — generate EU AI Act–aligned test documentation for auditors
  • CI-native — exit codes, JUnit XML, JSON reports. Drop it into your pipeline in 5 minutes

Quick Example

# kindlm.yaml
kindlm: 1
project: "my-agent"

providers:
  anthropic:
    apiKeyEnv: "ANTHROPIC_API_KEY"

models:
  - id: "claude-sonnet"
    provider: "anthropic"
    model: "claude-sonnet-4-5-20250929"
    params:
      temperature: 0.2

prompts:
  support:
    system: |
      You are a support agent. Use lookup_order(order_id) to find orders.
      Respond in JSON with { "action": string, "message": string }.
    user: "{{message}}"

tests:
  - name: "refund-request"
    prompt: "support"
    vars:
      message: "Refund order #123 please"
    tools:
      - name: "lookup_order"
        responses:
          - when: { order_id: "123" }
            then: { order_id: "123", total: 49.99, status: "delivered" }
    expect:
      output:
        format: "json"
        schemaFile: "./schemas/response.schema.json"
      toolCalls:
        - tool: "lookup_order"
          argsMatch: { order_id: "123" }
      guardrails:
        pii:
          enabled: true
        keywords:
          deny: ["not my problem"]
      judge:
        - criteria: "Response acknowledges the refund request professionally"
          minScore: 0.8

gates:
  passRateMin: 0.95
  schemaFailuresMax: 0
$ kindlm test kindlm.yaml

  ✓ refund-request (claude-sonnet) 3/3 passed [1.2s]

  Pass rate: 100% | Schema: 0 failures | Judge avg: 0.92
  Gates: ✓ PASSED

Features

Assertions

TypeWhat It Checks
SchemaJSON parse + JSON Schema validation (AJV)
PIIRegex patterns for SSN, credit cards, emails, custom
KeywordsDeny list (forbidden words) + allow list
JudgeLLM evaluates output against natural language criteria
Tool callsCorrect tool, correct arguments, correct order, shouldNotCall
DriftCompare output against a baseline (LLM judge or field diff)
ContainsOutput must/must not contain specific substrings

Reports

  • Terminal — colored summary with top failures
  • JSON — full structured report for programmatic use
  • JUnit XML — plug into any CI system's test reporting
  • Compliance — EU AI Act–aligned markdown report with audit hashes

Agent Testing

KindLM simulates tool responses so you can test agent behavior without calling real APIs:

tools:
  - name: "lookup_order"
    responses:
      - when: { order_id: "123" }
        then: { order_id: "123", total: 49.99 }
    defaultResponse: { error: "Order not found" }

The engine runs a multi-turn conversation: sends the prompt, intercepts tool calls, returns simulated responses, and continues until the model produces a final text response. Then all assertions run against the full conversation.

Baseline Drift Detection

# Save a baseline
kindlm baseline set kindlm-report.json --label "v2.0-release"

# Future runs compare against it
kindlm test kindlm.yaml --baseline latest

Drift is measured using LLM-as-judge comparison by default — it understands semantic changes, not just string differences.

Compliance Reports

compliance:
  enabled: true
  framework: "eu-ai-act"
  outputDir: "./compliance-reports"
  metadata:
    systemName: "Customer Support Agent"
    riskLevel: "limited"
    operator: "ACME Corp"

Generates a structured markdown document mapping test results to EU AI Act Annex IV requirements, with SHA-256 artifact hashes for audit trail.

CI Integration

# GitHub Actions
- run: kindlm test kindlm.yaml --junit junit.xml --format json
  env:
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

Exit code 0 = all gates passed. Exit code 1 = gates failed. That's it.

Cloud (Optional)

Upload results to KindLM Cloud for history, trends, and team collaboration:

kindlm login
kindlm test kindlm.yaml --upload true

The CLI works fully offline with every feature. Cloud is optional and adds:

Open Source (Free)Team ($49/mo)Enterprise ($299/mo)
CLI (all assertions, providers, reports)
Compliance reports (local markdown)
Cloud dashboard + test history90 daysUnlimited
Team members10Unlimited
Compliance PDF export
Signed compliance reports
SSO / SAML
Audit log API
Slack + webhook alerts

Philosophy: The open-source CLI solves the engineering problem. Cloud solves the organizational problem. Engineers choose the tool. Their company pays for the dashboard.

Documentation

Visit kindlm.com/docs for the full documentation, including:

License

MIT