dannyallover / llm_forecastingLinks
Forecasting with LLMs
☆49Updated last year
Alternatives and similar repositories for llm_forecasting
Users that are interested in llm_forecasting are comparing it to the libraries listed below
Sorting:
- Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!☆23Updated 3 months ago
- Evaluation of neuro-symbolic engines☆38Updated 11 months ago
- ☆48Updated last month
- A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)☆18Updated last month
- ☆116Updated 2 weeks ago
- ☆52Updated last year
- ☆43Updated 8 months ago
- Causal DAG Extraction from Text (DEFT)☆66Updated 6 months ago
- ☆46Updated last month
- ☆72Updated last year
- ☆97Updated 2 weeks ago
- Based on the tree of thoughts paper☆48Updated last year
- Finding semantically meaningful and accurate prompts.☆47Updated last year
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- ☆92Updated 2 months ago
- Data and code for the Corr2Cause paper (ICLR 2024)☆106Updated last year
- Functional Benchmarks and the Reasoning Gap☆88Updated 9 months ago
- Code for☆27Updated 7 months ago
- Evaluating the Moral Beliefs Encoded in LLMs☆26Updated 7 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆109Updated last year
- Discovering Data-driven Hypotheses in the Wild☆99Updated last month
- Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.☆28Updated 8 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆58Updated 7 months ago
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆118Updated last year
- ☆32Updated last year
- ☆48Updated last year
- Probabilistic programming with large language models☆124Updated last month
- Code/data for MARG (multi-agent review generation)☆44Updated 8 months ago
- Learning to route instances for Human vs AI Feedback (ACL 2025 Main)☆23Updated 2 months ago
- Hypothesizing interpretable relationships in text datasets using sparse autoencoders.☆33Updated last week