forecastingresearch / forecastbenchLinks
A dynamic forecasting benchmark for LLMs
☆51Updated this week
Alternatives and similar repositories for forecastbench
Users that are interested in forecastbench are comparing it to the libraries listed below
Sorting:
- Forecastbench Datasets, updated nightly☆22Updated this week
- large population models☆567Updated last week
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆137Updated last year
- Governance of the Commons Simulation (GovSim)☆64Updated last year
- ☆43Updated last year
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆82Updated last year
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 3 months ago
- A suite of open-ended, non-imitative tasks involving generalizable skills for large language model chatbots and agents to enable bootstra…☆43Updated last year
- ☆80Updated last year
- Forecasting with LLMs☆55Updated last year
- Open source interpretability artefacts for R1.☆170Updated 9 months ago
- ☆194Updated 7 months ago
- ☆57Updated last year
- Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning☆237Updated 11 months ago
- HypotheSAEs: hypothesizing interpretable relationships in text datasets using sparse autoencoders. https://arxiv.org/abs/2502.04382☆70Updated 3 months ago
- ☆223Updated this week
- Automated Qualitative Analysis of LLMs (ICLR 2025)☆52Updated 7 months ago
- 🤝 The code for "Can Large Language Model Agents Simulate Human Trust Behaviors?"☆109Updated 10 months ago
- ☆48Updated 2 months ago
- ⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆111Updated 3 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆112Updated last year
- ☆259Updated 3 weeks ago
- Inference-time scaling for LLMs-as-a-judge.☆328Updated 3 months ago
- Plurals: A System for Guiding LLMs Via Simulated Social Ensembles☆31Updated last month
- ☆118Updated 5 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆99Updated 4 months ago
- ☆484Updated last year
- Causal Agent based on Large Language Model☆61Updated 5 months ago
- ☆169Updated last year
- [WWW '25 Oral - GenMentor] Official code of our paper "LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutori…☆51Updated 2 months ago