forecastingresearch / forecastbenchLinks
A dynamic forecasting benchmark for LLMs
☆44Updated last week
Alternatives and similar repositories for forecastbench
Users that are interested in forecastbench are comparing it to the libraries listed below
Sorting:
- Forecastbench Datasets, updated nightly☆18Updated this week
- Forecasting with LLMs☆55Updated last year
- Plurals: A System for Guiding LLMs Via Simulated Social Ensembles☆28Updated 2 weeks ago
- Governance of the Commons Simulation (GovSim)☆59Updated 9 months ago
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆131Updated last year
- [WWW'25 Oral - GenMentor] Official code of our paper "LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutorin…☆25Updated this week
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated this week
- ☆78Updated last year
- large population models☆440Updated 3 weeks ago
- Open source version of Anthropic's Clio: A system for privacy-preserving insights into real-world AI use☆50Updated 2 months ago
- Repository of paper "How Likely Do LLMs with CoT Mimic Human Reasoning?"☆23Updated 8 months ago
- ☆113Updated 2 months ago
- ☆43Updated last year
- Data exports from select "open data" Polis conversations☆43Updated last year
- The repository for the scripts and materials for the paper "Simulating Opinion Dynamics with Networks of LLM-based Agents"."☆36Updated last year
- Automated Qualitative Analysis of LLMs (ICLR 2025)☆49Updated 4 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆79Updated 10 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆92Updated last month
- ☆24Updated last year
- Code for our NeurIPS'24 Dataset and Benchmark paper: Cooperation, Competition, and Maliciousness: LLM-Stakeholders Interactive Negotiatio…☆40Updated 11 months ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆81Updated 8 months ago
- ☆31Updated 3 months ago
- A virtual environment for developing and evaluating automated scientific discovery agents.☆188Updated 7 months ago
- An agent orchestration framework for economic agents☆105Updated 2 months ago
- Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025)☆256Updated 2 weeks ago
- Natural Language is All a Graph Needs - LLM / Graph AI / Knowledge Graph - Experiments☆38Updated 2 years ago
- The Prism Alignment Project☆84Updated last year
- Extracting spatial and temporal world models from LLMs☆257Updated 2 years ago
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆106Updated 2 months ago
- Discovering Data-driven Hypotheses in the Wild☆115Updated 5 months ago