Decomposing and Editing Predictions by Modeling Model Computation
☆139Jun 12, 2024Updated last year
Alternatives and similar repositories for modelcomponents
Users that are interested in modelcomponents are comparing it to the libraries listed below
Sorting:
- Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"☆25Dec 12, 2023Updated 2 years ago
- ☆51Jan 24, 2024Updated 2 years ago
- ☆25May 20, 2020Updated 5 years ago
- Official repository for our NeurIPS 2021 paper "Unadversarial Examples: Designing Objects for Robust Vision"☆105Jul 25, 2024Updated last year
- A tool to assist in the interpretation of learned features in sparse autoencoders (in particular the four SAE's trained by Joseph Bloom o…☆19Oct 4, 2024Updated last year
- Code for "Don't trust your eyes: on the (un)reliability of feature visualizations" (ICML 2024)☆34Nov 15, 2023Updated 2 years ago
- ☆32May 24, 2023Updated 2 years ago
- ☆17Dec 19, 2024Updated last year
- Code for the ICLR 2022 paper. Salient Imagenet: How to discover spurious features in deep learning?☆41Aug 19, 2022Updated 3 years ago
- This repository provides code for "On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness".☆46Nov 6, 2022Updated 3 years ago
- PyTorch Implementation of "ASTRA: An Action Spotting TRAnsformer for Soccer Videos", ACM MMSports 2023. | 3rd place solution for SoccerNe…☆42May 20, 2024Updated last year
- A fast, effective data attribution method for neural networks in PyTorch☆232Nov 18, 2024Updated last year
- ☆38Jun 10, 2021Updated 4 years ago
- Code for T-MARS data filtering☆35Aug 23, 2023Updated 2 years ago
- ModelDiff: A Framework for Comparing Learning Algorithms☆58Aug 15, 2023Updated 2 years ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆116Jun 13, 2024Updated last year
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643☆78Sep 4, 2023Updated 2 years ago
- Distilling Model Failures as Directions in Latent Space☆48Feb 8, 2023Updated 3 years ago
- A simple and efficient baseline for data attribution☆11Nov 10, 2023Updated 2 years ago
- Improving Alignment and Robustness with Circuit Breakers☆259Sep 24, 2024Updated last year
- Official implementation of Tabular Transfer Learning via Prompting LLMs (COLM 2024).☆13Aug 6, 2024Updated last year
- Official repository for "Stylized Adversarial Training" (TPAMI 2022)☆11Dec 30, 2022Updated 3 years ago
- The official implementation of Cross-Task Experience Sharing (COPS)☆29Oct 23, 2024Updated last year
- Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…☆29Dec 5, 2023Updated 2 years ago
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆45Updated this week
- Attribute statements generated by LLMs to preceding tokens using attention weights.☆22Apr 22, 2025Updated 10 months ago
- Fine-grained ImageNet annotations☆30May 25, 2020Updated 5 years ago
- Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation☆45Feb 27, 2023Updated 3 years ago
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆185Jun 24, 2025Updated 8 months ago
- ☆25Jun 22, 2023Updated 2 years ago
- Sparse and discrete interpretability tool for neural networks☆64Feb 12, 2024Updated 2 years ago
- Saliency Toolbox☆17Sep 17, 2023Updated 2 years ago
- This repository contains the resource introduced in the paper: "Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis"…☆25Oct 15, 2025Updated 5 months ago
- Comparison of gradient estimation techniques for black-box adversarial examples☆11Oct 31, 2018Updated 7 years ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Jaehyung Kim et al's ACL 2023 paper on "infoVerse: A Universal Framework for Dataset Characterization with Multidimensional Meta-informat…☆16Jun 28, 2023Updated 2 years ago
- Do input gradients highlight discriminative features? [NeurIPS 2021] (https://arxiv.org/abs/2102.12781)☆13Jan 10, 2023Updated 3 years ago
- ☆64Apr 9, 2024Updated last year
- Implementation of Boundary Attributions for Normal (Vector) Explanations☆11Aug 13, 2021Updated 4 years ago