zyxnlp / ICL-Interpretation-Analysis-ResourcesLinks

Links to publications that focus on the interpretation and analysis of in-context learning

☆11

Alternatives and similar repositories for ICL-Interpretation-Analysis-Resources

Users that are interested in ICL-Interpretation-Analysis-Resources are comparing it to the libraries listed below

Sorting:

holarissun / embedding-based-llm-alignment
Codebase for Paper Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs
☆19Updated 4 months ago
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆63Updated 9 months ago
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆77Updated 5 months ago
zhenyu-02 / LogitLens4LLMs
A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…
☆99Updated 3 weeks ago
RUCAIBox / Language-Specific-Neurons
☆83Updated 8 months ago
deeplearning-wisc / picle
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
☆26Updated last year
cooperleong00 / Awesome-LLM-Interpretability
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆265Updated 5 months ago
LiuAmber / RAHF
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆26Updated 11 months ago
tatsu-lab / opinions_qa
☆114Updated last year
Zhou-Zoey / RMB-Reward-Model-Benchmark
☆43Updated 5 months ago
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆132Updated 9 months ago
DAMO-NLP-SG / multilingual_analysis
[NeurIPS 2024] How do Large Language Models Handle Multilingualism?
☆39Updated 9 months ago
zepingyu0512 / awesome-SAE
awesome SAE papers
☆44Updated 3 months ago
deeplearning-wisc / args
☆44Updated last year
balevinstein / Probes
☆55Updated 2 years ago
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆40Updated 7 months ago
yizhongw / truthfulqa_reeval
☆11Updated last year
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆99Updated this week
SuperBruceJia / Awesome-LLM-Self-Consistency
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
☆107Updated last month
D2I-ai / eigenscore
☆32Updated 8 months ago
nrimsky / LM-exp
LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces
☆95Updated last year
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆114Updated 11 months ago
CaoYuanpu / BiPO
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
☆31Updated last year
nyu-mll / BBQ
Repository for the Bias Benchmark for QA dataset.
☆127Updated last year
nrimsky / CAA
Steering Llama 2 with Contrastive Activation Addition
☆178Updated last year
hkust-nlp / PEM_composition
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Updated last year
icip-cas / Verifier-Engineering
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
☆61Updated 8 months ago
epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆78Updated last year
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆110Updated 6 months ago
RUCAIBox / HaluEval-2.0
☆47Updated last year