Benchmarking LLM agents on consequential real-world tasks
active
About
Updated 05/18/26A multi-year research project at Boston University, supported by an Open Philanthropy grant, that systematically evaluates the capabilities and limitations of large language models on complex tasks. Led by assistant professor Najoung Kim with co-PI Sebastian Schuster, the project aims to develop rigorous benchmarks and analyses of how current LLMs perform on challenging, high-stakes tasks relevant to academic and scientific work.
Discussion
Sign in to comment
No comments yet. Be the first to share your thoughts.
Details
- Start Date
- -
- End Date
- -
- Expected Duration
- -
- Funding Raised to Date
- $756,396