Jometeorie / probing_llamaLinks

☆17

Alternatives and similar repositories for probing_llama

Users that are interested in probing_llama are comparing it to the libraries listed below

Sorting:

jinzhuoran / RWKU
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆86Updated last year
pillowsofwind / Knowledge-Conflicts-Survey
[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"
☆150Updated last year
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆178Updated 2 years ago
zepingyu0512 / awesome-SAE
awesome SAE papers
☆69Updated 7 months ago
D2I-ai / eigenscore
☆39Updated last year
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆48Updated last year
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆118Updated last year
deeplearning-wisc / picle
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
☆26Updated last year
Jeryi-Sun / ReDEeP-ICLR
The implement of paper:"ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability"
☆56Updated 7 months ago
RUCAIBox / HaluEval-2.0
☆48Updated 2 years ago
circle-hit / SAPT
Code for ACL 2024 accepted paper titled "SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language …
☆38Updated 11 months ago
Zhaoyi-Li21 / creme
[ACL'2024 Findings] "Understanding and Patching Compositional Reasoning in LLMs"
☆13Updated last year
ydyjya / LLM-IHS-Explanation
☆55Updated last year
lancopku / label-words-are-anchors
Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
☆168Updated last year
princeton-nlp / benign-data-breaks-safety
☆43Updated last year
AmourWaltz / Reliable-LLM
☆179Updated last year
pkunlp-icler / IKE
☆25Updated 2 years ago
jinhaoduan / SAR
[ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
☆61Updated last year
ZFancy / awesome-activation-engineering
A curated list of resources for activation engineering
☆120Updated 3 months ago
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆127Updated last year
zhenyu-02 / LogitLens4LLMs
A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…
☆142Updated 4 months ago
PPPP-kaqiu / Awesome-Parallel-Reasoning
Awesome-Parallel-Reasoning: Unlocking the reasoning potential of LLMs. Papers, Code, Resources & Survey.
☆43Updated 2 weeks ago
lorenzkuhn / semantic_uncertainty
☆182Updated last year
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆64Updated last year
Hunter-DDM / knowledge-neurons
Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"
☆173Updated last year
SALT-NLP / Efficient_Unlearning
☆38Updated 2 years ago
KID-22 / LLM-Unlearning-Paper-List
☆28Updated 2 weeks ago
nusnlp / FSPO
Official code for our paper "Reasoning Models Hallucinate More: Factuality-Aware Reinforcement Learning for Large Reasoning Models"
☆20Updated 2 months ago
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆68Updated last year
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆71Updated 3 years ago