K0NSULT // ai-truth/ipIII
k0nsult.cloud / ai-truth / ipIII / agent-security / en

AI / Agent Security

Phase 4 β€” the security layer for agentic systems. Every AI agent operating in the environment holds a cryptographic identity (DID), a measurable trust level (scoring / trust delta), a full action trail, and hard execution barriers: tool firewall, sandbox, prompt injection detector, claim-proof validator, mandatory human approval for high-risk actions, and a kill switch. Doctrine: no agent operates without proof of control.

An AI agent is an acting subject β€” treat it as a privileged account, not as a function.

Identity (DID + proof-of-control), least privilege (tool allowlist + scope + limit), observability (trace) and reversibility (quarantine/restore) are the four pillars. The trust score drops on anomaly and governs what an agent may do without human approval.

1. Agent registry

A central inventory of every AI agent. trust_delta = current_score βˆ’ baseline_score; a negative delta narrows privileges and raises the human-approval threshold.

DIDNameRoleTierBaselineCurrentΞ” trustStatusAllowed tools
did:k0:agt:soc-triage-01SOC TriageAnalyst-assistT29088βˆ’2ACTIVEread:alerts, query:siem
did:k0:agt:evidence-clerkEvidence ClerkDevSecOps-assistT292920ACTIVEhash:artifact, write:evidence
did:k0:agt:legal-drafterLegal DrafterLegal-assistT38571βˆ’14DEGRADEDdraft:report (human-gated)
did:k0:agt:payments-botPayments BotOps-assistT19540βˆ’55QUARANTINEDβ€” (cut off)

All agents above are SIMULATION β€” demonstration data illustrating the registry schema. Tier: T1=critical (access to transactions/data), T2=operational, T3=supporting.

2. Identity and proof of control (DID / proof-of-control)

Agent DID

A decentralized identifier did:k0:agt:* with a key pair. Every agent request is signed β€” no signature means rejection.

Proof-of-control

The agent periodically proves possession of the key (challenge–response). Loss of proof β†’ status UNVERIFIED and narrowing to read-only.

Chain attestation

The responsible operator (human owner) and the runtime environment are recorded. The agent ↔ owner binding remains verifiable.

3. Scoring / trust / delta

The trust level is a function of behavioural history. Downgrading events: anomaly in trace, attempt to use a tool outside scope, detected prompt-injection attempt, claim-proof validation failure, an action executed without required approval.

EventScore impactThreshold effect
Correct cycle with claim-proof validation+1Trust rebuild
Attempt to exceed tool scope (blocked)βˆ’8Alert, log
Prompt-injection pattern detected in inputβˆ’10Input quarantined
Claim without evidentiary backing (hallucination)βˆ’15Output blocked
High-risk action executed without human approvalβˆ’40Automatic quarantine
β‰₯ 85
Full tier privileges
no additional gates
60–84
DEGRADED
high-risk actions require approval
< 60
Quarantine
tools cut off, review
100%
Actions in trace
verifiable log

The threshold values and scoring are a SIMULATION of a reference model β€” to be calibrated per deployment.

4. Action trace

Every agent action (tool call, decision, output) lands in an immutable log with a chained hash. The trace is the basis for incident reconstruction and for the AI Act art. 73 report.

TRACE did:k0:agt:legal-drafter
  t0  input.received      hash=a91c…  src=intake:INC-0417
  t1  injection.scan      verdict=CLEAN
  t2  tool.call           name=draft:report scope=OK
  t3  claim.validate      3/4 claims proven  β†’ 1 UNPROVEN
  t4  output.block        reason=claim>proof (hallucination)
  t5  score.apply         βˆ’15  (92β†’77)
  t6  notify              AI Safety Officer

5. Tool firewall

A firewall for tool calls. Default is deny-all; an agent may invoke only a tool from the allowlist, within a given scope, within a limit, and β€” for sensitive actions β€” only after human approval.

LayerRuleExample
AllowlistOnly explicitly permitted toolsread:alerts yes; transfer:funds no
ScopeNarrowing of resource/parametersquery:siem only tenant=bank-demo
LimitRate/amount/sizemax 100 queries/min
Human approvalHigh-risk action = human gateevery write to the payments system
POST /api/agents/:id/tool-call
{ "tool":"transfer:funds", "args":{...} }
--> 403 { "blocked":"deny-by-default",
          "reason":"tool not in allowlist",
          "requires":"human_approval + tier T1 grant" }

6. Remaining execution controls

Agent sandbox

Isolation of the execution environment: no network access beyond an allowlist of hosts, no persistent writes outside the designated store, resource limits.

Prompt injection detector

Scanning of inputs (data, documents, web content) for instructions that override the agent's goal. Detection β†’ input quarantine + βˆ’10 score. Related: prompt injection playbook.

Claim-proof validator

Every factual statement in an agent's output must have an associated proof. No backing (hallucination) β†’ output blocked. Enforcement of the claim ≀ proof doctrine.

High-risk human approval

Actions on the sensitive list (payments, blocks, configuration changes, submission to an authority) require sign-off and are entered in the human-in-the-loop registry.

Kill switch

Immediate halt of an agent and revocation of tokens. Global (all agents) or per-DID. Activation is logged with the operator and the reason.

Forged agent identity

Impersonation of an agent is detected through absence of proof-of-control and signature inconsistency. Related: agent hijack playbook.

7. Quarantine / restore

Reversible isolation of an agent. Quarantine cuts off all tools, freezes tokens, and preserves the trace for analysis. Restore requires AI Safety Officer approval + a green review result.

POST /api/agents/:id/quarantine
{ "reason":"score<60 | injection | anomaly", "by":"ai-safety-officer" }
--> 200 { "status":"QUARANTINED", "tools_revoked":true, "trace_sealed":"sha256:…" }

POST /api/agents/:id/restore
{ "review_id":"REV-0091", "approved_by":"ai-safety-officer",
  "baseline_reset":true }
--> 200 { "status":"ACTIVE", "score":"baseline", "conditions":["read-only 24h"] }
Reversibility principle: no agent state is destructive without a path back. Quarantine always preserves the full trace β€” we isolate, we do not erase evidence.

8. Link to the risk map and playbooks

Chain: anomaly detection (detector/validator)score dropquarantineclassification (P0/P1)playbookvalidation + restorereport
Disclaimer: the agent registry, score values, thresholds and trace examples are a SIMULATION β€” demonstration data of a reference skeleton. A real deployment requires calibration of thresholds, integration with an actual agent identity system, and definition of the high-risk action list per organization. Regulatory references (AI Act art. 73, NIS2, GDPR art. 33/34, DORA) are framework/educational in nature and do not constitute certification or legal advice.

Related: AI Risk Map Β· Response Board Β· Compliance Β· Banking demo