forecastingresearch / forecastbenchLinks
A dynamic forecasting benchmark for LLMs
☆50Updated last week
Alternatives and similar repositories for forecastbench
Users that are interested in forecastbench are comparing it to the libraries listed below
Sorting:
- Forecastbench Datasets, updated nightly☆22Updated this week
- Governance of the Commons Simulation (GovSim)☆64Updated last year
- ☆252Updated last week
- Open source interpretability artefacts for R1.☆167Updated 9 months ago
- large population models☆563Updated this week
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆97Updated 3 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆124Updated last year
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆45Updated last year
- ☆80Updated last year
- ☆56Updated last year
- Inference-time scaling for LLMs-as-a-judge.☆325Updated 2 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆82Updated last year
- An agent orchestration framework for economic agents☆112Updated 5 months ago
- Context is Key: A Benchmark for Forecasting with Essential Textual Information☆84Updated 5 months ago
- ☆112Updated 11 months ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆86Updated 10 months ago
- Forecasting with LLMs☆55Updated last year
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 2 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆131Updated 2 weeks ago
- ☆43Updated last year
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆330Updated 5 months ago
- ☆118Updated 5 months ago
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆136Updated last year
- A toolkit for describing model features and intervening on those features to steer behavior.☆226Updated last month
- ☆194Updated 6 months ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆216Updated last week
- ☆214Updated 3 weeks ago
- ☆116Updated 2 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆137Updated 11 months ago
- ☆26Updated last year