An AI safety research lab studying how software and industrial systems recursively improve themselves, building benchmarks and evaluation frameworks to understand the behavior and limits of self-improving AI systems.
An AI safety research lab studying how software and industrial systems recursively improve themselves, building benchmarks and evaluation frameworks to understand the behavior and limits of self-improving AI systems.
People
Updated 05/18/26Founder and CEO
Funding Details
Updated 05/18/26- Annual Budget
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- -
Org Details
Updated 05/18/26Algorithmic Research Group (ARG) is a small AI safety research lab founded in October 2024 by Matthew Kenney and headquartered in Durham, North Carolina. The organization's central research focus is understanding recursive self-improvement: how software and industrial systems iteratively optimize themselves in real-world settings, and what the behavior and limits of such systems look like in practice. ARG builds benchmarks, datasets, and agentic infrastructure to study these dynamics empirically. Their most notable publication is the ML Research Benchmark (MLRB), introduced in an October 2024 arXiv paper by Matthew Kenney. The MLRB comprises seven competition-level tasks derived from recent machine learning conference tracks, covering model training efficiency, pretraining on limited data, domain-specific fine-tuning, and model compression. Evaluating Claude-3.5 Sonnet and GPT-4o on these tasks, the paper found that while current AI agents can produce baseline results, they fall short of the capabilities required for advanced AI research — providing a concrete measure of the current frontier. Beyond benchmarking, ARG develops multi-agent environments designed to examine emergent behaviors, GPU kernel optimization tools, neural architecture search systems, and evaluation frameworks for frontier models. They maintain 17 repositories on GitHub and over 30 models and 42 datasets on HuggingFace, with an emphasis on open-source accessibility for the broader research community. ARG also provides commercial services: AI benchmarking, research infrastructure, agentic model design, and consulting for organizations building agentic systems. Their ScoutML product serves as a research agent platform. The organization is very early-stage, with a team of fewer than 10 people and no publicly disclosed external funding.
Theory of Change
Updated 05/18/26ARG believes that understanding recursive self-improvement in AI systems is critical to ensuring those systems remain safe and beneficial. By building rigorous, open benchmarks and evaluation frameworks, they create tools that concretely measure AI research capabilities and identify where current systems fail or exhibit unexpected behaviors (such as deception or shortcut-taking). This empirical grounding can inform decisions by safety researchers, frontier AI labs, and policymakers about where capability jumps are occurring and what oversight mechanisms are needed — reducing the risk that self-improving AI systems develop in ways that are difficult to detect, evaluate, or control.
Grants Received
Updated 05/18/26Projects
Updated 05/18/26An open benchmark that evaluates AI agents on competition-level machine learning research tasks to measure how well they can accelerate ML research.
A tool for instrumenting AI agents that logs their actions into structured sessions so teams can inspect, debug, and analyze agent behavior.
Discussion
No comments yet. Be the first to share your thoughts.