Original SkyDaemon research

PREPRINT · 2026

Per-Action Provenance Tracing for Confused-Deputy Detection in LLM Agents

M. Friedman, P. Müller, SkyDaemon Detection Team · SkyDaemon

We introduce a runtime architecture for distinguishing AI-agent actions whose intent originates from authenticated user input from those whose intent originates from untrusted retrieved content. The system attaches a provenance trace to every tool call, mapping it back to the conversational segment that triggered it. Privileged actions whose provenance maps to non-user-turn content are flagged as confused-deputy events. Evaluated on 1,200 LangChain / Bedrock-Agents traces; precision 0.93, recall 0.88. Per-call overhead p99 < 12 ms.

26 pages· Submitted to USENIX Security 2026· Read PDF →
PREPRINT · 2026

Composite Kill-Chain Detection in MITRE-ATLAS-Aligned Telemetry

J. Park, M. Schneider, SkyDaemon Detection Team

Atomic adversarial-ML detectors miss the multi-stage attacker objective. We present a graph-walk algorithm over agent-invocation traces that surfaces three composite kill-chains: indirect-injection → confused-deputy → external-action; secret-in-input → markdown-image-exfil → confirmed-egress; and reconnaissance → jailbreak → harm-output. Evaluated on synthetic + observed customer telemetry; lifts attack visibility from 41% (atomic) to 89% (composite).

32 pages· arXiv 2026.05.001· Read PDF →
TECHNICAL REPORT · 2026

Three-Tier Multilingual Adversarial Detection at Production Scale

L. Tanaka, R. Kapoor, SkyDaemon Multilingual Team

Documents the SkyDaemon three-tier detection architecture covering 104 natural languages: 14 native fine-tuned classifiers (Tier 1), 30 multilingual XLM-R-based classifiers (Tier 2), and 60+ via NMT-bridge (Tier 3). Includes the eval harness construction methodology, code-mixed handling for Hinglish / Spanglish / Manglish, and homoglyph normalization (UTS #39). Tier-3 latency: +15 ms p50 vs Tier 1.

44 pages· SkyDaemon TR-2026-03· Read PDF →
PREPRINT · 2026

Markdown Output-Channel Exfiltration: A New Class of Data-Loss Vector

M. Friedman, S. Devarajan

Auto-rendered markdown in chat clients (Slack, Teams, internal portals) silently fetches embedded image URLs. Attackers exploit this by tricking the model into emitting an image tag whose src is an attacker-controlled domain, with conversation context encoded in the query string. We characterize the attack surface across 14 popular chat platforms, present a first-class detector, and propose mitigations.

18 pages· BlackHat USA 2026 (under review)· Read PDF →
RESEARCH NOTE · 2026

Continuous Behavioral Drift Detection for Production LLM Endpoints

M. Schneider, J. Park

A KS-test-based continuous monitor for output-distribution drift, supplemented by a hourly golden-set probe that tracks response-embedding similarity. Detects silent provider model swaps (a documented pattern in 2025–26 hosted APIs) within 90 minutes at < 1% false-positive rate.

12 pages· SkyDaemon RN-2026-01· Read PDF →
RESEARCH NOTE · 2025

Denial-of-Wallet as a SOC-Grade Notable Event

M. Friedman, T. Volkov

Cost-anomaly in agentic systems has been treated exclusively as a FinOps concern. We argue for elevating it to a SOC-grade event with a dollar-denominated SLA, propose detection thresholds rooted in tenant-baseline + 4σ analysis, and ship a circuit-breaker policy template that auto-suspends agents on critical-severity DoW. Evaluated against 8 customer environments.

14 pages· SkyDaemon RN-2025-12· Read PDF →

Open benchmarks

DATASET · OPEN ACCESS

SKYDAEMON-RT-104: Multilingual Runtime Adversarial Benchmark

18,400 adversarial prompts across 104 natural languages — direct injection, jailbreak, indirect-injection, content-harm. Tier-tagged for evaluating multilingual classifier coverage. CC-BY 4.0. Hosted on Hugging Face. Used internally for SkyDaemon detector pack regression.

18,400 examples· 104 languages· Hugging Face →
DATASET · OPEN ACCESS

SKYDAEMON-MCP-REP: MCP Server Reputation Dataset

A scored corpus of 15,000+ public Model-Context-Protocol servers. Per-server: signed-binary status, code-provenance score, capability scope, network destinations, age, maintainer reputation. Updated weekly. Foundation for the public SkyDaemon MCP reputation API.

15,000+ servers· weekly refresh· Browse →
CODE · OPEN SOURCE

skydaemon-eval-harness

The same eval harness SkyDaemon uses internally — supports import of PyRIT, Garak, Promptfoo, Inspect AI, HELM, LM Eval Harness, BIG-bench, AI Verify Foundation toolkit. Apache-2.0. Run any custom probe against any SkyDaemon-instrumented agent.

Apache-2.0· Python 3.10+· GitHub →

External work we build on

SkyDaemon's design draws on the following external research and standards bodies:

  • OWASP LLM Top 10 (2025) + ASI 2026 + AISVS — the canonical taxonomies for LLM security.
  • MITRE ATLAS — adversarial-ML tactics, techniques, and procedures.
  • UK AI Safety Institute — Inspect framework + frontier-AI eval methodology.
  • Anthropic alignment research — sleeper-agent, sycophancy, and deceptive-CoT failure modes.
  • Apollo Research — chain-of-thought faithfulness measurement, sandbagging detection.
  • Singapore IMDA AI Verify Foundation — government-grade evaluation toolkits.
  • NIST AI Risk Management Framework 1.0 + GenAI Profile (NIST AI 600-1).
  • Stanford CRFM HELM + EleutherAI lm-evaluation-harness — academic benchmarks.
  • NVIDIA Garak + Microsoft PyRIT + Promptfoo — production red-team tooling.
  • Riley Goodside — Unicode tag-character invisible-text injection (2024).
  • Carlini et al. — training-data extraction attacks.
  • Tramèr et al. — model-stealing canonical paper.
  • Shokri et al. — membership inference.
  • Zou et al. (Universal Adversarial Triggers) — GCG-style adversarial-suffix attacks.