sambowyer / bayes_evalsLinks
A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)
☆16Updated 3 weeks ago
Alternatives and similar repositories for bayes_evals
Users that are interested in bayes_evals are comparing it to the libraries listed below
Sorting:
- Unofficial implementation of Conformal Language Modeling by Quach et al☆28Updated last year
- ☆29Updated 2 months ago
- ☆43Updated 7 months ago
- Official implementation of "BERTs are Generative In-Context Learners"☆28Updated 3 months ago
- ☆26Updated 4 months ago
- ☆20Updated 11 months ago
- This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.☆13Updated last year
- Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.☆26Updated 7 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- This is the repository for the CONFLARE (CONformal LArge language model REtrieval) Python package.☆19Updated last year
- ☆12Updated 9 months ago
- Efficient Scaling laws and collaborative pretraining.☆16Updated 5 months ago
- ☆61Updated 3 weeks ago
- Aioli: A unified optimization framework for language model data mixing☆27Updated 5 months ago
- we got you bro☆35Updated 10 months ago
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆18Updated 2 weeks ago
- ☆35Updated 2 years ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆140Updated last month
- ☆23Updated 4 months ago
- ☆61Updated last week
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- Functional Benchmarks and the Reasoning Gap☆87Updated 8 months ago
- Attribution-based Parameter Decomposition☆25Updated 2 weeks ago
- The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆35Updated this week
- Extending Conformal Prediction to LLMs☆66Updated last year
- PyTorch implementation for MRL☆18Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 9 months ago
- Cross-prediction-powered inference☆14Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆74Updated 7 months ago
- A scalable implementation of diffusion and flow-matching with XGBoost models, applied to calorimeter data.☆18Updated 7 months ago