Benchmarking LLM agents on consequential real-world tasks

active

About

Updated 05/18/26

A multi-year research project at Boston University, supported by an Open Philanthropy grant, that systematically evaluates the capabilities and limitations of large language models on complex tasks. Led by assistant professor Najoung Kim with co-PI Sebastian Schuster, the project aims to develop rigorous benchmarks and analyses of how current LLMs perform on challenging, high-stakes tasks relevant to academic and scientific work.

Community Signal

Updated 05/18/26

0Upvotes

0Downvotes

0Endorsements

0Comments

Endorsements support Boston University.

No endorsements yet.

Discussion

No comments yet. Be the first to share your thoughts.

Details

Start Date: -
End Date: -
Expected Duration: -
Funding Raised to Date: $756,396