apple / ml-auraLinks
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models, ICML 2024
☆22Updated last year
Alternatives and similar repositories for ml-aura
Users that are interested in ml-aura are comparing it to the libraries listed below
Sorting:
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- ☆55Updated last year
- ☆52Updated last year
- Aioli: A unified optimization framework for language model data mixing☆27Updated 8 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆92Updated 10 months ago
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- ☆22Updated last month
- ☆29Updated 2 months ago
- ☆34Updated last year
- ☆55Updated 11 months ago
- Minimum Description Length probing for neural network representations☆20Updated 8 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆71Updated last year
- A package dedicated for running benchmark agreement testing☆18Updated 3 weeks ago
- ☆27Updated 7 months ago
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆24Updated 3 years ago
- Supercharge huggingface transformers with model parallelism.☆77Updated 2 months ago
- ☆149Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆43Updated last week
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated 2 years ago
- ☆80Updated this week
- ☆43Updated 3 years ago
- ☆128Updated last year
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆73Updated 3 months ago
- Language models scale reliably with over-training and on downstream tasks☆100Updated last year
- ☆57Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆89Updated last year
- Measuring and Controlling Persona Drift in Language Model Dialogs☆17Updated last year
- ☆88Updated last year