apple / ml-auraLinks
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models, ICML 2024
☆20Updated 10 months ago
Alternatives and similar repositories for ml-aura
Users that are interested in ml-aura are comparing it to the libraries listed below
Sorting:
- ☆32Updated last year
- ☆51Updated last year
- ☆13Updated last year
- ☆32Updated last year
- ☆11Updated last year
- ☆14Updated 9 months ago
- ☆23Updated 3 months ago
- Tasks for describing differences between text distributions.☆16Updated 9 months ago
- ☆13Updated 11 months ago
- ☆20Updated last year
- ☆24Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆92Updated this week
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆87Updated 6 months ago
- ☆13Updated 9 months ago
- ☆20Updated last month
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆18Updated 4 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆43Updated last year
- ☆25Updated 3 months ago
- This repo contains code and data for ICLR 2025 paper MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs☆31Updated 2 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆29Updated last week
- Self-Conditioning Pre-Trained Language Models, ICML 2022☆31Updated 2 years ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆26Updated last year
- ☆44Updated last year
- Lottery Ticket Adaptation☆39Updated 6 months ago
- Generating and validating natural-language explanations for the brain.☆52Updated 2 months ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated last year
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆43Updated 6 months ago
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆42Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- MEXMA: Token-level objectives improve sentence representations☆41Updated 5 months ago