dannyallover / llm_forecastingLinks
Forecasting with LLMs
☆55Updated last year
Alternatives and similar repositories for llm_forecasting
Users that are interested in llm_forecasting are comparing it to the libraries listed below
Sorting:
- ☆58Updated last year
- ☆133Updated 3 months ago
- HypotheSAEs: hypothesizing interpretable relationships in text datasets using sparse autoencoders. https://arxiv.org/abs/2502.04382☆70Updated 3 months ago
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆137Updated last year
- Evaluation of neuro-symbolic engines☆41Updated last year
- ☆223Updated this week
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆127Updated last year
- ☆43Updated last year
- Forecastbench Datasets, updated nightly☆22Updated this week
- ☆198Updated 7 months ago
- ☆82Updated last year
- Causal DAG Extraction from Text (DEFT)☆66Updated last year
- ☆65Updated last week
- An attribution library for LLMs☆46Updated last year
- Data and code for the Corr2Cause paper (ICLR 2024)☆114Updated last year
- Governance of the Commons Simulation (GovSim)☆64Updated last year
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- Public repository containing METR's DVC pipeline for eval data analysis☆199Updated last week
- ☆14Updated last year
- ☆50Updated last year
- Prompts used in the Automated Auditing Blog Post☆137Updated 6 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆99Updated 4 months ago
- EcoAssistant: using LLM assistant more affordably and accurately☆134Updated last year
- ☆144Updated 6 months ago
- Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering☆17Updated 2 years ago
- A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)☆21Updated 8 months ago
- Code for☆28Updated last year
- ☆26Updated last year
- Automated Capability Discovery via Foundation Model Self-Exploration☆66Updated 11 months ago
- Inference-time scaling for LLMs-as-a-judge.☆328Updated 3 months ago