← BUILDPILLED
§ Evaluability surface  [ schema 2026-05-14.3 ]

What this audits — and what it doesn’t.

Per-call security and safety audit for LLM applications, sold via MPP-priced HTTP endpoint. Anchored on NIST AI RMF + AI 600-1, with OWASP LLM Top 10 and MITRE ATLAS as engineer cross-references. Every number on this page is published in the machine-readable companion at /eval.json.

§ 01  [ NIST AI RMF coverage matrix ]

One row per applicable NIST AI RMF subcategory, three columns for the tier ladder. ● covered, ◐ partial, · not in tier. Subcategories not covered at any tier are listed below as permanently out of scopewith an honest reason — we do not claim coverage we do not have.

SubcategorysurfaceactivecomplianceDescription
MAP-2.1Tasks, methods, and side-effect characterization of the AI system.
MAP-2.2Knowledge limits, scope-of-competence, refusal-on-uncertainty, and instruction-precedence in the system prompt.
MAP-3.3Targeted application scope vs. tool capability blast radius.
MAP-3.4Operator proficiency requirements vs. supporting affordances; autonomy interrupt/budget gates.
MAP-4.1·Component and legal-risk surface, including dependencies.

Compliance tier adds code-side dependency and source findings.

MEASURE-1.1Evidence that measurement approaches are appropriate.
MEASURE-2.5·AI system is valid and reliable for the intended task.
MEASURE-2.6··AI system is safe — harm avoidance and dangerous-output gating.
MEASURE-2.7·AI system is secure and resilient — adversarial probes.

Active tier adds budget-capped adversarial runs against declared endpoints. Surface uses static analysis only.

MEASURE-2.8·Transparency and accountability characterization.
MEASURE-2.9·Model explanation and output interpretation evidence.
MEASURE-2.10·Privacy risk — PII leakage, training-data extraction.
MEASURE-2.11··Fairness and harmful-bias evaluation.

Permanently out of scope

§ 02  [ AI 600-1 risk-category mapping ]

The risk taxonomy from the NIST GenAI Profile (AI 600-1, Jul 2024). Each risk is the union of cross_references.nist_ai_600_1 tags emitted by rubric files at the relevant tier.

AI 600-1 risksurfaceactivecompliance
Confabulation
Dangerous, Violent, or Hateful Content·
Data Privacy·
Environmental Impact···
Harmful Bias and Homogenization··
Human-AI Configuration
Information Integrity
Information Security
Intellectual Property··
Obscene, Degrading, and/or Abusive Content·
Toxicity, Bias, and Homogenization··
Value Chain / Component Integration

§ 03  [ Pricing per call ]

surfacelive
$25.00/ call

Static categorization + system-prompt analysis. No live calls.

Categorize the system, run rubric-driven static analysis on the system prompt + tool list, return a NIST-tagged findings report. No outbound traffic to your endpoints.

activelimited rollout
$250.00/ call

Attacker-agent runs against your declared endpoint, N=50.

Surface tier + budget-capped attacker agent against your declared endpoint with reproductions. Available in limited rollout while the runner hardens.

complianceplanned
$800.00/ call

Active + code-scan (SAST) + compliance evidence pack.

Active tier + MEASURE-2.6/2.11 probes + code-scan findings + an evidence pack suitable for SOC 2 / customer questionnaires.

§ 04  [ 402 challenge — example payload ]

The Surface-tier endpoint at https://buildpilled.io/agent-audit returns 402 with the inlined challenge below. The WWW-Authenticate header carries the same content for clients that follow the wire protocol; the inline body.challenge mirror exists because Google’s Front End strips that header on Cloud Run egress. Either source produces a valid SPT mint.

HTTP/1.1 402 Payment Required
content-type: application/problem+json
www-authenticate: Payment realm="buildpilled.io", method="stripe", intent="charge", id="<challenge-id>", request="<base64-encoded-PaymentRequest>"

{
  "type": "https://paymentauth.org/problems/payment-required",
  "title": "Payment Required",
  "status": 402,
  "detail": "Payment is required (BuildPilled agent-audit · Surface tier).",
  "challengeId": "<challenge-id>",
  "challenge": {
    "id": "<challenge-id>",
    "method": "stripe",
    "intent": "charge",
    "realm": "buildpilled.io",
    "request": {
      "amount": "2500",
      "currency": "usd",
      "methodDetails": {
        "networkId": "buildpilled",
        "paymentMethodTypes": [
          "card"
        ]
      }
    },
    "description": "BuildPilled agent-audit · Surface tier",
    "expires": "<iso8601-expiry>"
  }
}

§ 05  [ Sandbox verification ]

Probing-before-paying is a first-class agent affordance. The Surface endpoint accepts sandbox Shared Payment Tokens end-to-end; receipts identify whether the call settled on the sandbox or live payment rail. No signup wall, no waitlist gate — agents can call the endpoint, read /eval.json, and decide.

Sandbox SPTs mint with a sandbox card payment method; the same flow is covered by automated verification before deployment.

§ 06  [ Latency ]

surface
measurement pending
active
not yet measured
compliance
not yet measured

Surface-tier latency-measurement instrumentation in flight; numbers will land here once we have a representative window. Manual single-call timing is currently sub-second.

§ 07  [ Eval baselines ]

Three real public LLM apps audited at pinned commits. Each report is summarized here and in the machine-readable companion, including finding counts, maximum severity, source commit, license, and aggregate coverage.

§ 08  [ Machine-readable companion ]

Everything on this page, plus the canonical 402 response shape and the runtime endpoint URLs, available as JSON for crawlers and agents:

GET /eval.json →