EleutherAI / w2sLinks
☆23Updated last year
Alternatives and similar repositories for w2s
Users that are interested in w2s are comparing it to the libraries listed below
Sorting:
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆83Updated 7 months ago
- Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature☆166Updated 4 months ago
- ☆108Updated 8 months ago
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers☆41Updated 8 months ago
- ☆98Updated 2 years ago
- A fast, effective data attribution method for neural networks in PyTorch☆220Updated 11 months ago
- ☆236Updated last year
- ☆51Updated last year
- [ICLR 2025] General-purpose activation steering library☆114Updated last month
- Sparse probing paper full code.☆62Updated last year
- ☆46Updated last year
- Augmenting Statistical Models with Natural Language Parameters☆29Updated last year
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆97Updated 4 years ago
- A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.☆44Updated 9 months ago
- Evaluate interpretability methods on localizing and disentangling concepts in LLMs.☆56Updated last year
- ☆241Updated last year
- Efficient empirical NTKs in PyTorch☆22Updated 3 years ago
- ☆49Updated 2 years ago
- ☆57Updated 2 years ago
- AI Logging for Interpretability and Explainability🔬☆130Updated last year
- ☆31Updated last year
- A library for efficient patching and automatic circuit discovery.☆78Updated 3 months ago
- ☆78Updated 3 years ago
- `dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.☆92Updated 2 weeks ago
- Forcing Diffuse Distributions out of Language Models☆17Updated last year
- ☆67Updated 2 years ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆76Updated last year
- ☆181Updated 11 months ago
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆106Updated last year
- ☆112Updated 3 years ago