dannyallover / llm_forecastingLinks
Forecasting with LLMs
☆55Updated last year
Alternatives and similar repositories for llm_forecasting
Users that are interested in llm_forecasting are comparing it to the libraries listed below
Sorting:
- ☆53Updated last year
- ☆79Updated last year
- HypotheSAEs: Hypothesizing interpretable relationships in text datasets using sparse autoencoders. https://arxiv.org/abs/2502.04382☆65Updated last month
- Causal DAG Extraction from Text (DEFT)☆66Updated 10 months ago
- ☆142Updated 4 months ago
- ☆119Updated last month
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆122Updated last year
- ☆104Updated 4 months ago
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆133Updated last year
- Evaluation of neuro-symbolic engines☆39Updated last year
- Functional Benchmarks and the Reasoning Gap☆90Updated last year
- ☆190Updated last week
- Forecastbench Datasets, updated nightly☆20Updated last week
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆93Updated last month
- you.com's framework for evaluating deep research systems.☆57Updated 6 months ago
- Data and code for the Corr2Cause paper (ICLR 2024)☆111Updated last year
- Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University☆201Updated last week
- A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)☆21Updated 6 months ago
- Probabilistic programming with large language models☆145Updated last week
- Materials for the course Principles of AI: LLMs at UPenn (Stat 9911, Spring 2025). LLM architectures, training paradigms (pre- and post-t…☆43Updated 5 months ago
- A dynamic forecasting benchmark for LLMs☆46Updated last week
- ☆111Updated 9 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆138Updated 7 months ago
- Governance of the Commons Simulation (GovSim)☆61Updated 10 months ago
- Inference-time scaling for LLMs-as-a-judge.☆312Updated 3 weeks ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆214Updated last year
- Sphynx Hallucination Induction☆53Updated 10 months ago
- Extending Conformal Prediction to LLMs☆68Updated last year
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆49Updated last year
- ☆43Updated last year