Language Model Safety Fund

active

About

Updated 04/03/26

The Language Model Safety Fund was a fiscally sponsored project under Players Philanthropy Fund (PPF), a 501(c)(3) public charity that provides back-office administration and tax-exempt status to charitable initiatives. The fund was led by Ethan Perez, an AI safety researcher who completed his PhD in natural language processing at New York University under the supervision of Kyunghyun Cho and Douwe Kiela. The fund's primary purpose was to hire and supervise research engineers to work on projects related to misalignment in language models. Specifically, Perez planned to hire four engineers to conduct technical research aimed at identifying and addressing safety issues in large language models. The Language Model Safety Fund received notable grant funding from two major sources. Open Philanthropy (via Good Ventures) recommended a grant of $425,800 to support salaries and equipment for language model misalignment projects. The Survival and Flourishing Fund, through a recommendation by Jaan Tallinn in its 2022-H1 round, provided $582,000 in general support, disbursed through Players Philanthropy Fund as the receiving charity. The fund appears to have been a precursor project that contributed to the formation of FAR.AI (originally the Fund for Alignment Research, later rebranded as Frontier Alignment Research). FAR.AI was co-founded by Ethan Perez, Adam Gleave, Scott Emmons, and Claudia Shi, with a public announcement in July 2022 and formal incorporation in October 2022. Open Philanthropy's grant page for the Language Model Safety Fund now redirects to FAR.AI-related pages, suggesting the project was absorbed into the larger organization. Ethan Perez subsequently moved to Anthropic, where he leads the adversarial robustness team, while continuing to be listed as a collaborator with FAR.AI.

Theory of Change

The Language Model Safety Fund operated on the theory that language models pose misalignment risks that require dedicated technical research to identify and mitigate. By funding a small team of research engineers supervised by an expert researcher, the fund aimed to produce concrete safety research outputs addressing how language models can behave in unintended or harmful ways. This work contributed to the broader effort to ensure advanced AI systems are aligned with human values before they become widely deployed.

Community Signal

Updated 04/03/26

0Upvotes

0Downvotes

1Endorsements

0Comments

Endorsements support FAR AI.

Endorsed by+1

Discussion

No comments yet. Be the first to share your thoughts.

Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: $1,007,800