forecastingresearch / forecastbenchLinks
A dynamic forecasting benchmark for LLMs
☆46Updated last week
Alternatives and similar repositories for forecastbench
Users that are interested in forecastbench are comparing it to the libraries listed below
Sorting:
- Forecastbench Datasets, updated nightly☆20Updated this week
- large population models☆460Updated this week
- Governance of the Commons Simulation (GovSim)☆61Updated 10 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆214Updated last year
- ☆43Updated last year
- ☆36Updated last month
- ☆228Updated last month
- Open source interpretability artefacts for R1.☆163Updated 7 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆122Updated last year
- Inference-time scaling for LLMs-as-a-judge.☆312Updated 3 weeks ago
- ☆273Updated 7 months ago
- ☆79Updated last year
- An agent orchestration framework for economic agents☆108Updated 3 months ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 3 weeks ago
- Forecasting with LLMs☆55Updated last year
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆80Updated 11 months ago
- ☆26Updated last year
- ☆114Updated 3 months ago
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆45Updated last year
- ⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆92Updated last month
- ☆111Updated 9 months ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆83Updated 9 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆130Updated 9 months ago
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆133Updated last year
- Plurals: A System for Guiding LLMs Via Simulated Social Ensembles☆28Updated last week
- METR Task Standard☆168Updated 9 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆121Updated 2 weeks ago
- Automated Qualitative Analysis of LLMs (ICLR 2025)☆51Updated 4 months ago
- ☆119Updated last month
- Official Repo for CRMArena and CRMArena-Pro☆126Updated 3 weeks ago