sambowyer / bayes_evalsLinks
A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)
☆18Updated last month
Alternatives and similar repositories for bayes_evals
Users that are interested in bayes_evals are comparing it to the libraries listed below
Sorting:
- Extending Conformal Prediction to LLMs☆67Updated last year
- Attribution-based Parameter Decomposition☆27Updated last month
- PyTorch library for Active Fine-Tuning☆87Updated 5 months ago
- This is the repository for the CONFLARE (CONformal LArge language model REtrieval) Python package.☆19Updated last year
- ☆43Updated 8 months ago
- Code for "Counterfactual Token Generation in Large Language Models", Arxiv 2024.☆28Updated 8 months ago
- Unofficial implementation of Conformal Language Modeling by Quach et al☆29Updated 2 years ago
- ☆29Updated 3 months ago
- A collection of various LLM sampling methods implemented in pure Pytorch☆23Updated 7 months ago
- Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"☆18Updated last month
- ☆69Updated last month
- ☆30Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated 10 months ago
- PyTorch implementation for MRL☆19Updated last year
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆31Updated 2 months ago
- ☆134Updated 3 months ago
- Testing Language Models for Memorization of Tabular Datasets.☆34Updated 5 months ago
- Learning to route instances for Human vs AI Feedback (ACL 2025 Main)☆23Updated 2 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆48Updated 3 months ago
- An introduction to LLM Sampling☆79Updated 7 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆102Updated 3 weeks ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 9 months ago
- ☆62Updated last week
- Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!☆23Updated 3 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 3 months ago
- ☆28Updated 3 weeks ago
- ☆72Updated last year
- code for training & evaluating Contextual Document Embedding models☆194Updated 2 months ago
- SDLG is an efficient method to accurately estimate aleatoric semantic uncertainty in LLMs☆25Updated last year
- A package for conformal prediction with conditional guarantees.☆61Updated 4 months ago