logix-project / logixLinks

AI Logging for Interpretability and Explainability🔬

☆133

Alternatives and similar repositories for logix

Users that are interested in logix are comparing it to the libraries listed below

Sorting:

ykwon0407 / DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
☆77Updated last year
roeehendel / icl_task_vectors
☆101Updated 2 years ago
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆123Updated 2 months ago
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆186Updated 7 months ago
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆150Updated 5 months ago
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆84Updated 8 months ago
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆67Updated last year
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆45Updated 10 months ago
PAIR-code / pretraining-tda
☆29Updated 9 months ago
milesaturpin / cot-unfaithfulness
☆51Updated 2 years ago
davidbau / baukit
☆238Updated last year
epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆80Updated last year
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆116Updated 9 months ago
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆195Updated last year
Thartvigsen / GRACE
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
☆82Updated 11 months ago
alon-albalak / data-selection-survey
A Survey on Data Selection for Language Models
☆252Updated 7 months ago
MaheepChaudhary / SAE-Ravel
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆12Updated 10 months ago
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆57Updated last month
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆123Updated last year
tatsu-lab / test_set_contamination
☆41Updated 2 years ago
balevinstein / Probes
☆57Updated 2 years ago
jacobdunefsky / transcoder_circuits
☆189Updated last year
prateeky2806 / ties-merging
☆198Updated last year
TRAIS-Lab / dattri
`dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.
☆95Updated last week
MadryLab / DsDm
☆51Updated last year
dannyallover / overthinking_the_truth
☆29Updated last year
abertsch72 / long-context-icl
Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"
☆40Updated last year
saprmarks / geometry-of-truth
☆95Updated last year
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆106Updated 2 weeks ago
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆80Updated 4 months ago