Guide Labs is a San Francisco-based AI research company building a new class of interpretable foundation models. Rather than applying post-hoc explanations to black-box models, Guide Labs integrates interpretability, safety, and reliability constraints directly into the model development pipeline — from architecture and datasets to training procedures. Their approach enables users to trace any model output back to its input context, human-understandable concepts, and original training data, making AI systems genuinely auditable and steerable without sacrificing downstream performance.
Funding Details
- Annual Budget
- -
- Monthly Burn Rate
- -
- Current Runway
- -
- Funding Goal
- -
- Funding Raised to Date
- $9,820,000
- Fiscal Sponsor
- -
Theory of Change
Guide Labs believes that the fundamental barrier to safe and aligned AI is the opacity of current models: because we cannot reliably understand how they reason, we cannot debug failures, verify alignment, or correct problematic behaviors. Their approach is to make interpretability a first-class design constraint rather than an afterthought. By building models whose every prediction can be traced to human-understandable concepts and source training data, they aim to give developers, auditors, and policymakers the tools to detect and correct unsafe behavior before deployment and in production. If interpretable models can match the performance of black-box ones — as they argue Steerling-8B demonstrates — the field will have a viable path to AI systems that are both capable and genuinely overseen by humans, reducing the risk of misaligned or uncontrollable AI.
Grants Received
from Open Philanthropy
Projects
No linked projects.
People
No linked people.
Discussion
Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.
Details
- Last Updated
- Apr 2, 2026, 10:00 PM UTC
- Created
- Mar 20, 2026, 2:34 AM UTC
