forecastingresearch / forecastbenchLinks
A dynamic forecasting benchmark for LLMs
☆41Updated last week
Alternatives and similar repositories for forecastbench
Users that are interested in forecastbench are comparing it to the libraries listed below
Sorting:
- Forecastbench Datasets, updated nightly☆16Updated this week
- Governance of the Commons Simulation (GovSim)☆59Updated 8 months ago
- large population models☆431Updated last week
- ☆46Updated 7 months ago
- ☆77Updated last year
- ☆218Updated 7 months ago
- Forecasting with LLMs☆54Updated last year
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 11 months ago
- Data exports from select "open data" Polis conversations☆43Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆89Updated 2 weeks ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆55Updated 7 months ago
- ☆43Updated 11 months ago
- Extracting spatial and temporal world models from LLMs☆257Updated 2 years ago
- ☆134Updated last year
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆128Updated last year
- Plurals: A System for Guiding LLMs Via Simulated Social Ensembles☆28Updated 2 weeks ago
- Causal DAG Extraction from Text (DEFT)☆66Updated 9 months ago
- A distributed agent orchestration framework for market agents☆104Updated 2 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆136Updated 3 months ago
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆258Updated 2 months ago
- Open source version of Anthropic's Clio: A system for privacy-preserving insights into real-world AI use☆47Updated last month
- A toolkit for describing model features and intervening on those features to steer behavior.☆205Updated 11 months ago
- Forecasting.☆35Updated 2 months ago
- ☆110Updated 2 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- Attribute (or cite) statements generated by LLMs back to in-context information.☆291Updated last year
- Open source interpretability artefacts for R1.☆161Updated 5 months ago
- Context is Key: A Benchmark for Forecasting with Essential Textual Information☆79Updated 2 months ago
- ☆52Updated last year
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆72Updated 8 months ago