activatedgeek / calibration-tuningLinks
ā52Updated 5 months ago
Alternatives and similar repositories for calibration-tuning
Users that are interested in calibration-tuning are comparing it to the libraries listed below
Sorting:
- Function Vectors in Large Language Models (ICLR 2024)ā180Updated 5 months ago
- [ššššš š š¢š§šš¢š§š š¬ šššš & ššš šššš ššššš šš«šš„] ššÆš©š¢šÆš¤šŖšÆšØ šš¢šµš©š¦š®š¢šµšŖš¤š¢š šš¦š¢š“š°šÆšŖšÆā¦ā52Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineeringā63Updated 10 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionā123Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsā134Updated 3 months ago
- ā97Updated last year
- ā45Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewardsā44Updated 5 months ago
- ā29Updated last year
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Mergingā108Updated last year
- ā41Updated last year
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)ā40Updated 10 months ago
- PASTA: Post-hoc Attention Steering for LLMsā123Updated 10 months ago
- ā91Updated last year
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)ā41Updated 4 months ago
- A library for efficient patching and automatic circuit discovery.ā77Updated 2 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".ā80Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.ā80Updated 6 months ago
- Exploring the Limitations of Large Language Models on Multi-Hop Queriesā27Updated 7 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)ā61Updated last year
- Code for "Reasoning to Learn from Latent Thoughts"ā119Updated 6 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Promptingā33Updated last year
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".ā60Updated last month
- [ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimizationā31Updated 8 months ago
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignmentā58Updated last year
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.