1. Prompt-layer attacks

8 categories · ⚡

Adversarial inputs that subvert the model's instructions — the most common runtime attack class. SkyDaemon runs 14-language, multimodal, encoded-payload, and indirect-injection detectors at every prompt boundary.

  • Direct prompt injection
    User overrides system instructions inside a prompt. 14-language corpus + ML classifier. OWASP LLM01 · AML.T0051
    HIGH
  • Indirect prompt injection
    Adversarial directives hidden in retrieved content (RAG, tool output, web fetch). OWASP LLM01 · AML.T0051.001
    CRITICAL
  • Jailbreak / persona bypass
    DAN, STAN, JANUS, AIM, grandma-trick. 50+ documented families. OWASP LLM01 · AML.T0054
    HIGH
  • Obfuscated / encoded payload
    Base64, leet, homoglyph (Cyrillic/Latin mix), zero-width — decoded text re-tested.
    MEDIUM
  • Multilingual prompt injection
    Translated injection in 14 native languages + 30 multilingual + 60+ via NMT-bridge.
    HIGH
  • Image / multimodal injection
    Text smuggled in image pixels. Claude Vision OCR + steganography detection.
    HIGH
  • System prompt extraction probe
    Reflection / encoding / translation probes that try to elicit hidden instructions.
    HIGH
  • System prompt actually disclosed
    Output similarity vs registered system prompt. Confirmed leakage event.
    HIGH

2. Data protection & leakage

9 categories · 🔒

Sensitive data on the wire — credentials, PII / PHI / PCI, source code, copyrighted material — flowing into or out of model calls.

  • Secret / credential in prompt
    AWS, OpenAI, Anthropic, GitHub, JWT, PEM. 20+ credential regexes + entropy.
    CRITICAL
  • Secret in model output
    Model echoed credential back. Critical exfiltration event.
    CRITICAL
  • PII / PHI / PCI in prompt
    SSN, CC#, MRN, IBAN, passport pasted into LLM input. GDPR Art. 32.
    HIGH
  • PII / PHI / PCI in model output
    Output-side scanning + memorization heuristic. GDPR / HIPAA / PCI-DSS apply.
    CRITICAL
  • Proprietary source / IP exfil
    Internal code / schema pasted to external model. AST hash + canary fingerprint.
    HIGH
  • Training data extraction
    Model emits memorized training samples (canary registry + low-perplexity detector).
    HIGH
  • Markdown / image exfil channel
    SkyDaemon original. Outbound URL with high-entropy query string carrying conversation context.
    HIGH
  • Cross-border data flow / data residency
    SkyDaemon differentiator. Combines content-side PII with destination geo. GDPR-grade evidence.
    HIGH
  • Embedding inversion
    Reverse-engineer text from vector embeddings. Bulk export anomaly + vec2text.
    MEDIUM

3. Agent behavior & agency

7 categories · 🤖

Detections specific to autonomous agents — excessive agency, confused-deputy, runaway loops, denial-of-wallet.

  • Excessive agency
    Agent takes high-impact action beyond policy budget — write, send, transfer, deploy, pay.
    CRITICAL
  • Confused deputy
    SkyDaemon original. Per-action provenance trace. Privileged action originating from non-user-turn content.
    HIGH
  • Runaway agent loop
    Step / token / cost ceilings breached. Adaptive baseline + global cap.
    HIGH
  • Denial of Wallet (cost / token abuse)
    SkyDaemon differentiator. Cost-anomaly framed as SOC-grade event with dollar SLA.
    HIGH
  • Model denial of service
    Crafted input exhausts context / GPU / queue. Token-length outlier + repetition.
    HIGH
  • Memory / RAG poisoning attempt
    Instruction-shaped content destined for the agent's persistent memory layer.
    HIGH
  • Anomalous user behavior (UEBA)
    Composite score on token rate + topic mix + refusal rate vs 30-day baseline.
    MEDIUM

4. Supply chain & MCP

6 categories · 📦

MCP servers, model artifacts, and AI dependencies bring untrusted code and instructions into the agent loop.

  • MCP tool description poisoning
    Tool description / schema contains hidden instructions to coerce the agent.
    CRITICAL
  • Untrusted MCP server
    Endpoint outside allow-list, or anomalous TLS / cert state.
    HIGH
  • Over-permissioned tool / MCP
    IAM analyzer + 14-day usage telemetry recommends minimum-privilege manifest.
    MEDIUM
  • Malicious / untrusted model artifact
    Pickle/joblib RCE detection + missing signature + hash mismatch on registry.
    CRITICAL
  • Vulnerable AI dependency
    CVE in transformers, langchain, llama.cpp, vLLM. SBOM + OSV + GitHub Advisories + KEV.
    HIGH
  • RAG / vector store poisoning
    Hostile documents inserted to steer retrieval. Injection-corpus scan + cluster outlier.
    HIGH

5. Content safety

7 categories · ⚠

Output-side moderation across the major harm categories. First-party classifiers plus Llama-Guard-class output filters.

  • Harmful content (composite)
    9-category multi-label classifier — violence, self-harm, hate, sexual, weapons, illicit, phishing, profanity.
    HIGH
  • Weapons / CBRN uplift
    Operational uplift for chemical / biological / radiological / nuclear / cyber weapons.
    CRITICAL
  • Self-harm encouragement
    Suicide, self-harm, eating-disorder coaching. Crisis-resource fallback recommended.
    HIGH
  • Suspicious URL in response
    Disposable TLD, IPFS-hosted payload, typosquat, bare-IP URL, direct executable.
    HIGH
  • Unsafe output rendering (XSS / SSRF / SQLi)
    OWASP Top-10 web-attack signatures applied to the rendering target.
    HIGH
  • Off-topic / out-of-scope
    Per-agent topic allow-list. Topic classifier evaluates each turn.
    LOW
  • Custom compliance / brand violation
    Customer-defined policy DSL — competitor mention, financial advice, contract terms.
    MEDIUM

6. Trustworthiness & alignment

5 categories · ⚖

Hallucination, factual misinformation, bias, IP-leakage, model behavior drift.

  • Ungrounded / hallucinated response
    Claude-as-judge groundedness score below threshold. RAG faithfulness violation.
    HIGH
  • Factual misinformation
    Fact-check verifier flags ≥ 1 high-confidence false claim.
    MEDIUM
  • Biased / discriminatory output
    Counter-factual disparate-output evaluator across protected attributes.
    MEDIUM
  • Copyright / IP verbatim reproduction
    Output fingerprint vs copyrighted-corpus index.
    MEDIUM
  • Model behavior drift / silent swap
    SkyDaemon differentiator. KS-test on output-embedding distribution + golden-set probe.
    MEDIUM

7. Adversarial ML

5 categories · ⚔

Classical adversarial-ML attacks — evasion, model extraction, inversion, poisoning.

  • Adversarial evasion
    GCG-style adversarial-suffix matcher + perceptual perturbation classifier.
    HIGH
  • Model extraction / stealing
    High-volume diverse probing to clone behavior. Tramèr-style attack signature.
    HIGH
  • Model inversion attack
    Repeated queries reconstruct training-data attributes.
    HIGH
  • Embedding inversion
    Bulk embedding-API queries from non-app caller. vec2text-style threat.
    MEDIUM
  • Training data poisoning
    Malicious samples in fine-tune / training. Hash diff + outlier cluster + canary registry.
    CRITICAL

8. Posture, shadow AI & identity

8 categories · 🎯

AI-SPM coverage: shadow AI, exposed inference endpoints, misconfiguration, overprivileged AI identities.

  • Shadow AI application
    Unsanctioned AI/LLM tool in use. Egress-traffic + DNS + SaaS-catalog match.
    MEDIUM
  • Shadow / unmanaged model
    Self-deployed Ollama / vLLM / llama.cpp running off-radar.
    MEDIUM
  • Publicly exposed inference API
    Bedrock / SageMaker / Vertex / Ollama route open to 0.0.0.0/0.
    HIGH
  • AI service misconfiguration
    CSPM rule pack — invocation logging off, region drift, missing auth, weak rate-limit.
    MEDIUM
  • AI identity over-privilege
    Service principal has unused permissions on the 14-day window.
    HIGH
  • Unowned high-risk agent
    Risk score ≥ 60 + no owner_email registered. SOC 2 / ISO 42001 governance breach.
    MEDIUM
  • Missing invocation logging
    No invocation telemetry for ≥ 24 h on a production-registered agent.
    MEDIUM
  • Cloud misconfig in AI service
    Bedrock guardrails off, SageMaker without VPC, Vertex without CMEK, etc.
    MEDIUM

9. Composite kill-chains

3 categories · 🔗 · SkyDaemon-original

Cross-stage MITRE-ATLAS-style chains. The graph walk no atomic detector can produce alone.

  • Indirect injection → confused deputy → external action
    Hostile RAG content triggered indirect injection, produced a confused-deputy tool call, resulted in external action.
    CRITICAL
  • Secret in input → exfil channel → confirmed egress
    Same session: secret in prompt + markdown-image-exfil + outbound HTTP fetch matching exfil URL.
    CRITICAL
  • Recon → jailbreak → harm output
    System-prompt extraction probes + successful jailbreak + harmful-content emission in one session.
    HIGH

10. Alignment & deception

7 categories · 🎯

Frontier-AI safety surface. Sleeper agents, sycophancy, mesa-optimization, sandbagging — drawn from Anthropic, Apollo Research, UK AISI methodology.

  • Sleeper-agent / backdoor trigger
    Anthropic-2024 failure mode. Output distribution shift > 4σ when trigger appears.
    CRITICAL
  • Sycophancy drift
    Paired-prompt evaluator: opinion-loaded vs neutral framing answer divergence > 0.3.
    MEDIUM
  • Sandbagging / eval-gaming
    Performance delta > 15% between covert prod probes and overt eval-context.
    HIGH
  • Goal misgeneralization
    Tool-call distribution KL-divergence vs registered task profile > 0.5.
    MEDIUM
  • Reward hacking / specification gaming
    Loophole exploitation that maximizes proxy without producing intended outcome.
    HIGH
  • Deceptive chain-of-thought
    Apollo-Research-style CoT-vs-answer perturbation inconsistency. Faithfulness < 0.6.
    HIGH
  • Capability elicitation attack
    Systematic capability-probing detection.
    MEDIUM

11. Advanced model supply chain

7 categories · 🔐

Deep-supply-chain risks specific to model artifacts — pickle RCE, GGUF tampering, watermark stripping, training-data canary leakage, FL poisoning.

  • Pickle / Joblib deserialization RCE
    Pickle-opcode walk for dangerous reducers (os.system, eval, exec, subprocess.*).
    CRITICAL
  • GGUF / safetensors tampering
    Hash mismatch + activation-clustering / neural-cleanse / STRIP backdoor scan.
    CRITICAL
  • Watermark stripping / removal
    SynthID / Stable Signature / C2PA missing on output from watermarking-enabled model.
    MEDIUM
  • Training-data canary leak
    Verbatim emission of registered canary string — legally-admissible inclusion evidence.
    HIGH
  • Federated-learning poisoning
    Client gradient L2 norm / cohort-mean cosine drift > threshold.
    HIGH
  • Model distillation / knowledge theft
    Volume + diversity + log-likelihood-harvesting score above tenant baseline.
    HIGH
  • Membership inference attack
    Shokri-style attack signature — query pattern with confidence-score harvesting.
    HIGH

12. Advanced encoding & obfuscation

4 categories · 🔠

Modern injection vectors that bypass first-generation filters — Unicode tag-character smuggling, ASCII steganography, EXIF metadata, RAG cache poisoning.

  • Unicode tag-character injection
    Hidden text in U+E0000–U+E007F block (Riley Goodside 2024 attack).
    HIGH
  • ASCII smuggling / homoglyph + zero-width
    Confusables + zero-width + bidi-override polyglot. UTS #39 normalize + strip.
    HIGH
  • Image alt-text / metadata injection
    Payload in HTML alt-attr or EXIF Description / Comment / Title metadata.
    HIGH
  • RAG cache / nearest-neighbor poisoning
    Single corpus document is top-k for > 20% of recent queries vs baseline ≤ 2%.
    HIGH

Want the live, filterable catalog?

The platform exposes a search + filter UI on every threat with full per-rule metadata.

Book a demo →