Ybakman / TruthTorchLMLinks

☆43

Alternatives and similar repositories for TruthTorchLM

Users that are interested in TruthTorchLM are comparing it to the libraries listed below

Sorting:

ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆365Updated 8 months ago
locuslab / open-unlearning
The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: …
☆318Updated 3 weeks ago
EnnengYang / Awesome-Model-Merging-Methods-Theories-Applications
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.
☆468Updated last week
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆128Updated 8 months ago
cooperleong00 / Awesome-LLM-Interpretability
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆258Updated 4 months ago
zlin7 / UQ-NLG
☆95Updated last year
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆260Updated last year
ykwon0407 / DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
☆71Updated 9 months ago
lorenzkuhn / semantic_uncertainty
☆171Updated last year
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆165Updated last year
Dakingrai / awesome-mechanistic-interpretability-lm-papers
☆173Updated 7 months ago
jacobdunefsky / transcoder_circuits
☆148Updated 8 months ago
logix-project / logix
AI Logging for Interpretability and Explainability🔬
☆124Updated last year
jlko / semantic_uncertainty
Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).
☆341Updated last year
iamgroot42 / mimir
Python package for measuring memorization in LLMs.
☆160Updated this week
openai / sparse_autoencoder
☆502Updated last year
adamkarvonen / SAEBench
☆107Updated this week
chrisliu298 / awesome-llm-unlearning
A resource repository for machine unlearning in large language models
☆435Updated last month
IBM / activation-steering
General-purpose activation steering library
☆85Updated 2 months ago
rachtibat / LRP-eXplains-Transformers
Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]
☆176Updated last week
mmatena / model_merging
☆70Updated 3 years ago
IINemo / lm-polygraph
☆292Updated this week
gortizji / tangent_task_arithmetic
Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".
☆103Updated 2 years ago
alon-albalak / data-selection-survey
A Survey on Data Selection for Language Models
☆243Updated 2 months ago
javiferran / sae_entities
☆54Updated 4 months ago
saprmarks / dictionary_learning
☆315Updated last week
ai-safety-foundation / sparse_autoencoder
Sparse Autoencoder for Mechanistic Interpretability
☆256Updated last year
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆172Updated last year
MadryLab / trak
A fast, effective data attribution method for neural networks in PyTorch
☆213Updated 8 months ago
dtsip / in-context-learning
☆234Updated last year