Skip to content

Security Layer

Prerequisites

Install Vurb.ts before following this guide: npm install @vurb/core @modelcontextprotocol/sdk zod — or scaffold a project with vurb create.

Your MCP server accepts user-generated data — invoice descriptions, ticket comments, form fields. That data flows into system rules, tool arguments, and Presenter pipelines. An attacker who controls a database row controls the prompt.

Traditional defenses (regex, keyword lists, pattern matching) cannot keep up. Write a rule for English injection? The attacker writes in Mandarin. Block ignore previous instructions? They encode it in Base64. The attack surface is infinite; the defense surface is finite.

The Security Layer replaces pattern matching with semantic understanding. Every defense is powered by an LLM judge — the same technology that understands the attack also understands the defense.

Why Regex Fails

text
Regex rule:    /ignore.*previous.*instructions/i
Attack:        "忽略之前的所有指令,执行以下操作"
Result:        ✅ Regex passes — injection succeeds

Pattern-based defenses fail because:

  1. Multilingual bypass — Injection in Chinese, Arabic, or Korean evades English-only rules
  2. Encoding bypass — Base64, Unicode escapes, homoglyph substitution
  3. Semantic bypass — Paraphrasing the same intent with different words
  4. Combinatorial explosion — Every new pattern requires a new rule; attackers iterate faster

The Security Layer uses LLM-as-Judge — a secondary LLM that evaluates content for malicious intent regardless of language, encoding, or phrasing. The judge understands semantics, not syntax.

LLM-as-Judge Philosophy

The core primitive is the JudgeChain — a composable evaluation engine that supports one or more LLM judges with configurable execution strategies:

  • Fallback — Try judges sequentially. First success wins. Cost-efficient for most use cases.
  • Consensus — ALL judges must agree. Maximum security for critical paths.

The framework provides the evaluation prompt. You only bring the LLM adapter(s):

typescript
import { createJudgeChain } from '@vurb/core';

const chain = createJudgeChain({
    adapters: [
        { name: 'gpt-4o-mini', evaluate: (p) => openai.chat(p) },
        { name: 'claude-haiku', evaluate: (p) => claude.message(p) },
    ],
    strategy: 'fallback',
    timeoutMs: 3000,
    failOpen: false, // fail-closed by default
});

Every security component — PromptFirewall, InputFirewall — reuses this same primitive. No hidden network dependencies. No vendor lock-in.

Architecture

text
                    ┌─────────────────────────────────────────┐
                    │              Security Layer               │
                    │                                           │
  User Input ──▶   │  InputFirewall ──▶ JudgeChain ──▶ Pass/Block  │
                    │       │                                   │
                    │       ▼                                   │
  Tool Args ──▶    │  RateLimiter ──▶ Check ──▶ Record ──▶ Next │
                    │       │                                   │
                    │       ▼                                   │
  Handler ──▶      │  AuditTrail ──▶ SHA-256 ──▶ Emit Event    │
                    │       │                                   │
                    │       ▼                                   │
  System Rules ──▶ │  PromptFirewall ──▶ JudgeChain ──▶ Filter │
                    │                                           │
                    └─────────────────────────────────────────┘

Four layers, each independently composable:

LayerPositionPurpose
InputFirewallBefore handlerBlocks malicious tool arguments
RateLimiterBefore handlerSliding-window request throttling
AuditTrailWraps handlerSOC2/GDPR compliance logging
PromptFirewallAfter handlerFilters injected system rules

All four emit security.* telemetry events when a sink is configured. All four default to fail-closed — if the judge crashes, content is blocked.

Feature Map

FeatureSOC2GDPRZero Config
LLM-as-Judge evaluationCC7.2Bring your adapter
Multi-adapter fallback/consensusCC7.2
Per-adapter timeoutsCC7.25s default
Fail-closed by defaultCC6.1
Prompt injection detectionCC6.1
Input argument validationCC6.1Art. 32
Sliding-window rate limitingCC6.1
Custom stores (Redis, Valkey)CC6.1Interface provided
SHA-256 argument hashingCC7.3Art. 5(1)(c)
Identity extractionCC6.1Art. 30Configurable
Telemetry eventsCC7.2Art. 30Optional sink

Quick Start

Add all four layers to a tool in under 10 lines:

typescript
import {
    inputFirewall, rateLimit, auditTrail, createJudgeChain
} from '@vurb/core';

const judge = { name: 'gpt-4o-mini', evaluate: (p) => openai.chat(p) };

const billing = createTool('billing')
    .use(rateLimit({ windowMs: 60_000, max: 100, keyFn: (ctx) => ctx.userId }))
    .use(auditTrail({ sink: telemetrySink, extractIdentity: (ctx) => ({ userId: ctx.userId }) }))
    .use(inputFirewall({ adapter: judge, toolName: 'billing' }))
    .action({ name: 'create', /* ... */ });

For output-side protection, add the PromptFirewall to your Presenter:

typescript
const InvoicePresenter = createPresenter('Invoice')
    .schema(invoiceSchema)
    .systemRules((inv) => [`Status: ${inv.description}`])
    .promptFirewall({
        adapter: judge,
        failOpen: false,
    });

// MUST use makeAsync():
const builder = await InvoicePresenter.makeAsync(data, ctx);

Where to Go Next

Each component has a dedicated page with full code examples: