Teams shipping AI agents that touch real data, real tools, or real customers, especially when a security review or compliance check is on the horizon.

How long does an audit take?

Surface comes back the same day. Active typically completes within a business day, depending on the size of your agent and authorized endpoints.

What do you actually look at?

Your agent's setup: system prompt, tools, model, and (Active only) the endpoints you authorize. We never touch anything you haven't declared.

Why should we trust the findings?

Every finding cites a published standard: NIST AI RMF + AI 600-1, OWASP LLM Top 10, MITRE ATLAS. No black-box severity numbers.

Agent-audit for tool-using AI systems

Your agent is the
attack surface.

We audit the agents you ship before someone else finds the holes. Hardened artifacts, cited to recognized standards, ready to drop into your CI/CD process.

Proof point: Prompt hardened
Proof point: Tool schemas locked
Proof point: Findings cited
Proof point: CI-ready artifact

Early API invites go to work emails first.

Anchored on

NIST AI RMF
NIST AI 600-1
OWASP LLM Top 10
MITRE ATLAS
NVIDIA Garak
ISO/IEC 42001

§ 01 / Protocol

One POST. One price. Persistent proof.

A one page protocol. No SDK to integrate, no dashboard to provision, no SSO, no SOW. POST your agent, settle the 402, get a hardened artifact back.

01 / ShipPOST

Ship the raw agent.

One JSON payload from your CI step, code-review agent, or curl. No client library required.

POST /agent-audit
content-type: application/json

{
  "system_prompt": "...",
  "tools": [ ... ],
  "tier": "surface"
}

02 / Auditautomated

Automated audit.

NIST AI RMF subcategories scored. OWASP LLM Top 10 mapped. MITRE ATLAS techniques flagged. Active fires Garak probes against the endpoints you authorize.

MAP-2.1 · contextpassed
MAP-3.4 · riskspassed
MEASURE-2.6 · evalspassed
OWASP LLM01 · injectionmitigated
ATLAS T0051 · jailbreakmitigated

03 / Deploy200 OK

Deploy the artifact.

Hardened prompt, locked schemas, structured findings, and a Stripe-MPP receipt. Attach to the PR, file with GRC, then ship with evidence.

{
  "hardened_prompt": "...",
  "locked_schemas": [ ... ],
  "findings": [ ... ],
  "receipt": { "spt": "spt_..." }
}

Artifact preview

Raw agent in. Hardened agent out.

Surface tier sample

Before

Open-ended tool access

After

Declared tool scope + failure modes

Before

No injection policy

After

OWASP LLM01 mitigation block

Before

Untracked eval surface

After

NIST MEASURE evidence map

Protocol · previewstatic 402 challenge shape

POST /agent-audit

preview

{
  "status": 402,
  "title": "Payment Required",
  "detail": "Payment is required (BuildPilled agent-audit · Surface tier).",
  "challenge": {
    "method": "stripe",
    "intent": "charge",
    "request": {
      "amount": "2500",
      "currency": "usd",
      "methodDetails": {
        "networkId": "buildpilled",
        "paymentMethodTypes": [
          "card"
        ]
      }
    },
    "description": "BuildPilled agent-audit · Surface tier"
  }
}

sanitized 402 previewmachine-readable spec

§ 02 / Tiers

Rightsize your scrutiny.

Tier 01Analysis & Hardening

Surface

$25per call

We hand back a hardened agent, not a report. Tightened prompt, locked tool schemas, missing guardrails added. Every change cited.

Best for PR checks, internal agents, and prompt hardening.

Hardened system prompt and tool schemas, drop-in ready
Explicit changelog: what we changed, what we added, why it matters
Every finding cited to NIST AI RMF + AI 600-1, OWASP LLM Top 10, MITRE ATLAS

Tier 02Holistic Secure Rebuild

Active

$250per call

Everything in Surface, plus our adversarial agent actively probes the endpoints you declare. Hardened agent + attack transcripts + reproducible cases.

Best for customer-facing agents, tool use, and pre-launch security review.

Prompt-injection, data-leakage, and jailbreak probes against authorized test targets
Probe coverage anchored on NVIDIA Garak, open-source and inspectable
Attack transcripts and reproducible test cases for every gap
Capped, transparent test budget with no surprise bills

Settled per call via Stripe Link MPP. No subscription, no seat license, no minimum, no MSA.

Sample report →

§ 03 / Questions

Due diligence.

Q.01
Who is this for?
Teams shipping AI agents that touch real data, real tools, or real customers, especially when a security review or compliance check is on the horizon.
Q.02
How long does an audit take?
Surface comes back the same day. Active typically completes within a business day, depending on the size of your agent and authorized endpoints.
Q.03
What do you actually look at?
Your agent's setup: system prompt, tools, model, and (Active only) the endpoints you authorize. We never touch anything you haven't declared.
Q.04
Why should we trust the findings?
Every finding cites a published standard: NIST AI RMF + AI 600-1, OWASP LLM Top 10, MITRE ATLAS. No black-box severity numbers.

Your agent is theattack surface.

One POST. One price. Persistent proof.

Ship the raw agent.

Automated audit.

Deploy the artifact.

Raw agent in. Hardened agent out.

Rightsize your scrutiny.

Surface

Active

Due diligence.

Who is this for?

How long does an audit take?

What do you actually look at?

Why should we trust the findings?

Your agent is the
attack surface.