Metaculus / forecasting-toolsLinks
AI Forecasting tools to help humans forecast the future. Additionally a framework for building a Metaculus AI Benchmarking Tournament Bot
☆39Updated this week
Alternatives and similar repositories for forecasting-tools
Users that are interested in forecasting-tools are comparing it to the libraries listed below
Sorting:
- ☆38Updated 2 weeks ago
- ☆32Updated 6 months ago
- A simple bot template that you can use to forecast a Metaculus tournament☆38Updated 3 months ago
- Inference-time scaling for LLMs-as-a-judge.☆316Updated last month
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆314Updated 5 months ago
- Benchmark for LLMs playing full press diplomacy☆57Updated 9 months ago
- METR Task Standard☆168Updated 10 months ago
- ☆92Updated last year
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆135Updated last week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆123Updated last month
- A library for making RepE control vectors☆672Updated 2 months ago
- ☆183Updated last year
- My writings about ARC (Abstraction and Reasoning Corpus)☆87Updated last week
- ⚖️ Awesome LLM Judges ⚖️☆146Updated 7 months ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆234Updated 4 months ago
- Open notebook for my research in culture science☆79Updated 9 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆453Updated last year
- ☆308Updated last week
- ☆104Updated 4 months ago
- Machine Learning for Alignment Bootcamp☆81Updated 3 years ago
- Draw more samples☆196Updated last year
- Parallel Reasoning: llm-consortium orchestrates mulitple LLMs, iteratively refines & achieves consensus.☆369Updated last month
- Inspect: A framework for large language model evaluations☆1,580Updated this week
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆768Updated this week
- smol models are fun too☆92Updated last year
- A toolkit for describing model features and intervening on those features to steer behavior.☆221Updated last week
- ☆57Updated 8 months ago
- ☆115Updated last week
- Open source interpretability artefacts for R1.☆165Updated 8 months ago
- A Loom implementation in Obsidian☆312Updated 9 months ago