Gray Swan AI is an AI safety and security company that builds tools to assess vulnerabilities in AI deployments and develop more robust, attack-resistant AI models. It was founded in 2024 by Carnegie Mellon University researchers who pioneered automated jailbreaking research.
Gray Swan AI is an AI safety and security company that builds tools to assess vulnerabilities in AI deployments and develop more robust, attack-resistant AI models. It was founded in 2024 by Carnegie Mellon University researchers who pioneered automated jailbreaking research.
People
Updated 05/18/26Co-founder and CTO
Co-founder and Chief Executive Officer
Chief Strategy Officer
Chief Product Officer
Co-founder and Chief Scientist
Independent AI Red Teamer
Advisor
AI Safety Research Engineer
Funding Details
Updated 05/18/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- $5,500,000
Org Details
Updated 05/18/26Gray Swan AI is an AI safety and security company founded in June 2024 by researchers from Carnegie Mellon University. The company was co-founded by Matt Fredrikson (CEO), Zico Kolter (Chief Scientist), and Andy Zou (CTO), all of whom have deep roots in AI safety and security research at CMU. The company launched publicly in July 2024 with the mission of becoming the safety and security provider for the AI era. Gray Swan's founding team was responsible for some of the most influential research in AI adversarial robustness. In 2023, they published GCG (Greedy Coordinate Gradient), the first fully automated method for jailbreaking large language models, which exposed fundamental vulnerabilities in AI safety guardrails and received coverage in the New York Times, CNN, and the Washington Post. They followed this with Circuit Breakers, a novel alignment technique using representation engineering to robustly prevent AI systems from generating harmful content even under powerful unseen attacks. They have also produced widely used benchmarks including HarmBench, WMDP, AgentHarm, and CyBench. The company offers an AI Security Suite combining Cygnal (real-time input/output filtering and continuous agentic monitoring) with Shade (automated vulnerability testing and red-teaming). Gray Swan also operates the Gray Swan Arena, described as the world's largest red-teaming network, which has generated over three million attack attempts from researchers and developers testing frontier model safety. The Arena has hosted major competitions in partnership with the UK AI Safety Institute, the US AI Safety Institute, OpenAI, Anthropic, Google DeepMind, Amazon, and Meta. Gray Swan is headquartered in Pittsburgh, Pennsylvania, operating out of offices on South Craig Street near Carnegie Mellon's campus. The company raised approximately $5.5 million in early funding from friends, family, and organizations focused on AI safety and security. Dan Hendrycks, co-founder of the Center for AI Safety and an early advisor and investor, later divested his equity stake in the company due to perceived conflicts of interest related to his role in supporting California's SB-1047 AI regulation bill, though he continued as an unpaid advisor. As of early 2026, Gray Swan continues to run active red-teaming competitions through the Arena platform, including a Safeguards Challenge with a $140,000 prize pool running through May 2026, and has published research on indirect prompt injection vulnerabilities in agentic AI systems in partnership with Stanford.
Theory of Change
Updated 05/18/26Gray Swan believes that identifying and patching AI vulnerabilities before malicious actors exploit them is a critical lever for reducing catastrophic risk from advanced AI systems. By developing rigorous automated methods to find weaknesses in AI models and creating robust defenses like Circuit Breakers, they aim to raise the security baseline for the entire industry. Their theory is that frontier AI labs and enterprises deploying AI cannot reliably assess the safety of their own systems without specialized adversarial testing tools and benchmarks. By providing these tools and running large-scale red-teaming competitions, Gray Swan generates shared knowledge about AI failure modes that benefits the whole field. Making secure, hard-to-jailbreak AI models available (like their Cygnet model) also demonstrates that safety and capability are compatible, incentivizing adoption of better safety practices across the industry.
Grants Received– no grants recorded
Updated 05/18/26Projects
Updated 05/18/26Online arena for AI red teaming where participants jailbreak and stress-test frontier models to discover vulnerabilities, win prizes, and unlock AI security career opportunities.
Safety-focused Llama‑3–based language model engineered and tuned for maximal safety, significantly more resilient to powerful adversarial attacks while preserving key capabilities of its base model.
Comprehensive AI evaluation and continuous integration tool that automatically red-teams AI components to surface risks and vulnerabilities beyond what manual red teams can cover.
Discussion
No comments yet. Be the first to share your thoughts.