forecastingresearch / forecastbenchLinks
A dynamic forecasting benchmark for LLMs
☆28Updated last week
Alternatives and similar repositories for forecastbench
Users that are interested in forecastbench are comparing it to the libraries listed below
Sorting:
- Governance of the Commons Simulation (GovSim)☆57Updated 7 months ago
- Forecastbench Datasets, updated nightly☆13Updated last week
- Plurals: A System for Guiding LLMs Via Simulated Social Ensembles☆25Updated 2 months ago
- ☆45Updated 5 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆124Updated 2 months ago
- ☆198Updated 5 months ago
- LLM Attributor: Attribute LLM's Generated Text to Training Data☆59Updated last year
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆124Updated last year
- Forecasting with LLMs☆52Updated last year
- The repository for the scripts and materials for the paper "Simulating Opinion Dynamics with Networks of LLM-based Agents"."☆33Updated last year
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆54Updated 6 months ago
- ☆73Updated last year
- The Prism Alignment Project☆79Updated last year
- ☆98Updated 4 months ago
- Data exports from select "open data" Polis conversations☆40Updated 10 months ago
- ☆43Updated 10 months ago
- ☆23Updated last year
- [WWW'25 Oral - GenMentor] Supplementary resources of our paper "LLM-powered Multi-agent Framework for Goal-oriented Learning in Intellige…☆23Updated 5 months ago
- ☆122Updated 3 weeks ago
- Learning to route instances for Human vs AI Feedback (ACL 2025 Main)☆23Updated last month
- Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!☆24Updated 4 months ago
- Context is Key: A Benchmark for Forecasting with Essential Textual Information☆71Updated 3 weeks ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆110Updated 11 months ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 10 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 11 months ago
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆71Updated last year
- large population models☆400Updated this week
- ☆106Updated 6 months ago
- Open source interpretability artefacts for R1.☆157Updated 4 months ago
- Repository of paper "How Likely Do LLMs with CoT Mimic Human Reasoning?"☆23Updated 6 months ago