HanjieChen / Reading-ListLinks

☆57

Alternatives and similar repositories for Reading-List

Users that are interested in Reading-List are comparing it to the libraries listed below

Sorting:

i-gallegos / Fair-LLM-Benchmark
☆156Updated 2 years ago
lorenzkuhn / semantic_uncertainty
☆179Updated last year
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆141Updated last year
zepingyu0512 / awesome-SAE
awesome SAE papers
☆60Updated 6 months ago
Dakingrai / awesome-mechanistic-interpretability-lm-papers
☆214Updated last year
cooperleong00 / Awesome-LLM-Interpretability
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆285Updated 8 months ago
jiachangliu / KATEGPT3
☆38Updated 2 years ago
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆47Updated last year
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆67Updated last year
ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models
This repository collects all relevant resources about interpretability in LLMs
☆384Updated last year
allenai / unqover
UnQovering Stereotyping Biases via Underspecified Questions - EMNLP 2020 (Findings)
☆21Updated 4 years ago
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆84Updated 8 months ago
balevinstein / Probes
☆57Updated 2 years ago
lyy1994 / awesome-data-contamination
The Paper List on Data Contamination for Large Language Models Evaluation.
☆106Updated 2 weeks ago
interpretingdl / eacl2024_transformer_interpretability_tutorial
Materials for EACL2024 tutorial: Transformer-specific Interpretability
☆60Updated last year
ZFancy / awesome-activation-engineering
A curated list of resources for activation engineering
☆112Updated last month
alon-albalak / data-selection-survey
A Survey on Data Selection for Language Models
☆253Updated 7 months ago
McGill-NLP / bias-bench
ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.
☆151Updated 3 months ago
zepingyu0512 / awesome-LLM-neuron
☆33Updated 5 months ago
davidbau / baukit
☆237Updated last year
launchnlp / LitCab
☆25Updated 5 months ago
zhenyu-02 / LogitLens4LLMs
A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…
☆130Updated 3 months ago
RUCAIBox / Language-Specific-Neurons
☆87Updated 11 months ago
cloudygoose / blindspot_nlg
☆20Updated last year
jacobdunefsky / transcoder_circuits
☆188Updated last year
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆45Updated 10 months ago
shmsw25 / FActScore
A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…
☆406Updated 7 months ago
CaoYuanpu / BiPO
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
☆37Updated last year
DAMO-NLP-SG / multilingual_analysis
[NeurIPS 2024] How do Large Language Models Handle Multilingualism?
☆46Updated last year
HoagyC / sparse_coding
Using sparse coding to find distributed representations used by neural networks.
☆286Updated 2 years ago