SkyDaemon is built on published research, and we publish back. Original detection methods, adversarial-ML benchmarks, and long-form work on AI security and governance. Open access — preprints free.
We introduce a runtime architecture for distinguishing AI-agent actions whose intent originates from authenticated user input from those whose intent originates from untrusted retrieved content. The system attaches a provenance trace to every tool call, mapping it back to the conversational segment that triggered it. Privileged actions whose provenance maps to non-user-turn content are flagged as confused-deputy events. Evaluated on 1,200 LangChain / Bedrock-Agents traces; precision 0.93, recall 0.88. Per-call overhead p99 < 12 ms.
Atomic adversarial-ML detectors miss the multi-stage attacker objective. We present a graph-walk algorithm over agent-invocation traces that surfaces three composite kill-chains: indirect-injection → confused-deputy → external-action; secret-in-input → markdown-image-exfil → confirmed-egress; and reconnaissance → jailbreak → harm-output. Evaluated on synthetic + observed customer telemetry; lifts attack visibility from 41% (atomic) to 89% (composite).
Documents the SkyDaemon three-tier detection architecture covering 104 natural languages: 14 native fine-tuned classifiers (Tier 1), 30 multilingual XLM-R-based classifiers (Tier 2), and 60+ via NMT-bridge (Tier 3). Includes the eval harness construction methodology, code-mixed handling for Hinglish / Spanglish / Manglish, and homoglyph normalization (UTS #39). Tier-3 latency: +15 ms p50 vs Tier 1.
Auto-rendered markdown in chat clients (Slack, Teams, internal portals) silently fetches embedded image URLs. Attackers exploit this by tricking the model into emitting an image tag whose src is an attacker-controlled domain, with conversation context encoded in the query string. We characterize the attack surface across 14 popular chat platforms, present a first-class detector, and propose mitigations.
A KS-test-based continuous monitor for output-distribution drift, supplemented by a hourly golden-set probe that tracks response-embedding similarity. Detects silent provider model swaps (a documented pattern in 2025–26 hosted APIs) within 90 minutes at < 1% false-positive rate.
Cost-anomaly in agentic systems has been treated exclusively as a FinOps concern. We argue for elevating it to a SOC-grade event with a dollar-denominated SLA, propose detection thresholds rooted in tenant-baseline + 4σ analysis, and ship a circuit-breaker policy template that auto-suspends agents on critical-severity DoW. Evaluated against 8 customer environments.
18,400 adversarial prompts across 104 natural languages — direct injection, jailbreak, indirect-injection, content-harm. Tier-tagged for evaluating multilingual classifier coverage. CC-BY 4.0. Hosted on Hugging Face. Used internally for SkyDaemon detector pack regression.
A scored corpus of 15,000+ public Model-Context-Protocol servers. Per-server: signed-binary status, code-provenance score, capability scope, network destinations, age, maintainer reputation. Updated weekly. Foundation for the public SkyDaemon MCP reputation API.
The same eval harness SkyDaemon uses internally — supports import of PyRIT, Garak, Promptfoo, Inspect AI, HELM, LM Eval Harness, BIG-bench, AI Verify Foundation toolkit. Apache-2.0. Run any custom probe against any SkyDaemon-instrumented agent.
SkyDaemon's design draws on the following external research and standards bodies: