forecastingresearch / forecastbenchLinks
A dynamic forecasting benchmark for LLMs
☆48Updated last week
Alternatives and similar repositories for forecastbench
Users that are interested in forecastbench are comparing it to the libraries listed below
Sorting:
- large population models☆547Updated 3 weeks ago
- Forecastbench Datasets, updated nightly☆20Updated this week
- Governance of the Commons Simulation (GovSim)☆62Updated 11 months ago
- Inference-time scaling for LLMs-as-a-judge.☆317Updated last month
- summaries of ai research☆52Updated 7 months ago
- ☆54Updated last year
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated last month
- ☆79Updated last year
- ☆44Updated last year
- Open source interpretability artefacts for R1.☆165Updated 8 months ago
- ☆474Updated last year
- Forecasting with LLMs☆55Updated last year
- ☆104Updated 4 months ago
- ☆277Updated 8 months ago
- ☆26Updated last year
- ☆233Updated 3 weeks ago
- Discovering Data-driven Hypotheses in the Wild☆122Updated 6 months ago
- Collection of evals for Inspect AI☆313Updated this week
- Causal DAG Extraction from Text (DEFT)☆66Updated 11 months ago
- ☆321Updated last year
- ☆316Updated last year
- A toolkit for describing model features and intervening on those features to steer behavior.☆223Updated last week
- An agent orchestration framework for economic agents☆109Updated 4 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆94Updated 2 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆81Updated last year
- ☆124Updated 2 months ago
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆314Updated 4 months ago
- Benchmark for LLMs playing full press diplomacy☆57Updated 9 months ago
- A strongly typed Python DSL for developing message passing multi agent systems☆53Updated last year
- ☆116Updated 4 months ago