fc2869 / lo-fitLinks

LoFiT: Localized Fine-tuning on LLM Representations

☆41

Alternatives and similar repositories for lo-fit

Users that are interested in lo-fit are comparing it to the libraries listed below

Sorting:

yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆66Updated 11 months ago
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆121Updated last year
javiferran / sae_entities
☆63Updated 7 months ago
Glaciohound / LM-Steer
Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)
☆125Updated 3 months ago
dannyallover / overthinking_the_truth
☆29Updated last year
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆45Updated 11 months ago
balevinstein / Probes
☆56Updated 2 years ago
epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆80Updated last year
ericwtodd / function_vectors
Function Vectors in Large Language Models (ICLR 2024)
☆181Updated 6 months ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 2 years ago
roeehendel / icl_task_vectors
☆98Updated last year
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆61Updated last year
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆82Updated 7 months ago
hkust-nlp / PEM_composition
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Updated last year
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆100Updated last month
BunsenFeng / AbstainQA
AbstainQA, ACL 2024
☆28Updated last year
edenbiran / RippleEdits
Evaluating the Ripple Effects of Knowledge Editing in Language Models
☆56Updated last year
zjysteven / mink-plus-plus
[ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs
☆45Updated 4 months ago
MikaStars39 / FeatureAlignment
FeatureAlignment = Alignment + Mechanistic Interpretability
☆31Updated 7 months ago
DAMO-NLP-SG / multilingual_analysis
[NeurIPS 2024] How do Large Language Models Handle Multilingualism?
☆42Updated 11 months ago
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆111Updated last month
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆117Updated last year
tatsu-lab / test_set_contamination
☆41Updated last year
Thartvigsen / GRACE
[NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors
☆81Updated 10 months ago
logix-project / logix
AI Logging for Interpretability and Explainability🔬
☆129Updated last year
yihuaihong / ConceptVectors
[EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
☆35Updated 2 months ago
deeplearning-wisc / args
☆45Updated last year
RUCAIBox / HaluEval-2.0
☆47Updated last year
lifan-yuan / OOD_NLP
[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…
☆35Updated 2 years ago
ykwon0407 / DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
☆76Updated last year