dannyallover / llm_forecastingLinks
Forecasting with LLMs
☆49Updated last year
Alternatives and similar repositories for llm_forecasting
Users that are interested in llm_forecasting are comparing it to the libraries listed below
Sorting:
- ☆43Updated 9 months ago
- ☆72Updated last year
- ☆52Updated last year
- Forecastbench Datasets, updated nightly☆12Updated this week
- Learning to route instances for Human vs AI Feedback (ACL 2025 Main)☆23Updated 2 weeks ago
- Hypothesizing interpretable relationships in text datasets using sparse autoencoders.☆39Updated this week
- Causal DAG Extraction from Text (DEFT)☆66Updated 7 months ago
- EcoAssistant: using LLM assistant more affordably and accurately☆132Updated last year
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆119Updated last year
- ☆48Updated last year
- Based on the tree of thoughts paper☆48Updated last year
- Functional Benchmarks and the Reasoning Gap☆88Updated 10 months ago
- ☆95Updated 3 months ago
- Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!☆24Updated 4 months ago
- ☆160Updated last month
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆113Updated last year
- A dynamic forecasting benchmark for LLMs☆27Updated this week
- ☆108Updated this week
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆118Updated 2 months ago
- This is the official repository for HypoGeniC (Hypothesis Generation in Context) and HypoRefine, which are automated, data-driven tools t…☆79Updated this week
- ☆137Updated 2 weeks ago
- Data and code for the Corr2Cause paper (ICLR 2024)☆110Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆87Updated 10 months ago
- A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)☆19Updated 2 months ago
- Inference-time scaling for LLMs-as-a-judge.☆272Updated 3 weeks ago
- ☆63Updated last year
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆76Updated 7 months ago
- ☆104Updated 2 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆105Updated 7 months ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆54Updated 5 months ago