shuoli90 / Rank-Calibration
This is the repo for constructing a comprehensive and rigorous evaluation framework for LLM calibration.
☆12Updated last year
Alternatives and similar repositories for Rank-Calibration
Users that are interested in Rank-Calibration are comparing it to the libraries listed below
Sorting:
- Conformal Language Modeling☆29Updated last year
- ☆28Updated 2 months ago
- ☆49Updated 2 months ago
- ☆51Updated last month
- Lightweight Adapting for Black-Box Large Language Models☆22Updated last year
- Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models☆46Updated last year
- ☆29Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆57Updated 5 months ago
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆35Updated 6 months ago
- ☆82Updated 9 months ago
- ☆40Updated last year
- ☆43Updated last year
- Code for the paper "Spectral Editing of Activations for Large Language Model Alignments"☆24Updated 4 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆30Updated 3 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆79Updated last month
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 11 months ago
- Learning adapter weights from task descriptions☆17Updated last year
- ☆31Updated 2 months ago
- ☆29Updated last year
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆65Updated 7 months ago
- Unofficial implementation of Conformal Language Modeling by Quach et al☆28Updated last year
- Augmenting Statistical Models with Natural Language Parameters☆26Updated 7 months ago
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆76Updated 4 months ago
- LoFiT: Localized Fine-tuning on LLM Representations☆38Updated 4 months ago
- ☆19Updated 10 months ago
- ☆69Updated 3 months ago
- ☆50Updated last year
- In-context Example Selection with Influences☆15Updated 2 years ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆54Updated last year
- ☆28Updated 9 months ago