apple / ml-aura
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models, ICML 2024
☆20Updated 10 months ago
Alternatives and similar repositories for ml-aura:
Users that are interested in ml-aura are comparing it to the libraries listed below
- ☆31Updated 11 months ago
- ☆13Updated last year
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆17Updated 4 months ago
- ☆11Updated last year
- ☆32Updated last year
- ☆49Updated last year
- ☆33Updated 10 months ago
- ☆14Updated 8 months ago
- ☆13Updated 10 months ago
- Aioli: A unified optimization framework for language model data mixing☆25Updated 3 months ago
- Lottery Ticket Adaptation☆39Updated 5 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆84Updated 5 months ago
- Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind☆61Updated 7 months ago
- Minimum Description Length probing for neural network representations☆19Updated 3 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆25Updated last year
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆25Updated 5 months ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆23Updated last week
- ☆24Updated 2 months ago
- The repository contains code for Adaptive Data Optimization☆24Updated 5 months ago
- Efficient Scaling laws and collaborative pretraining.☆16Updated 3 months ago
- Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.☆25Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated 2 weeks ago
- ☆49Updated last year
- Few-shot Learning with Auxiliary Data☆27Updated last year
- Tasks for describing differences between text distributions.☆16Updated 9 months ago
- ☆17Updated last week
- Measuring and Controlling Persona Drift in Language Model Dialogs☆17Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆30Updated 3 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago