ChnQ / TracingLLMLinks

☆30

Alternatives and similar repositories for TracingLLM

Users that are interested in TracingLLM are comparing it to the libraries listed below

Sorting:

princeton-nlp / benign-data-breaks-safety
☆41Updated last year
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆62Updated last year
yaojin17 / Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"
☆62Updated last year
ShuheSH / A-Survey-of-the-Reasoning-Abilities-of-LLMs
☆25Updated 8 months ago
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆99Updated last year
keven980716 / weak-to-strong-deception
[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"
☆13Updated last year
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆62Updated 11 months ago
haotiansun14 / BBox-Adapter
Lightweight Adapting for Black-Box Large Language Models
☆24Updated last year
thu-coai / SafeUnlearning
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
☆32Updated last year
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆87Updated this week
qcznlp / uncertainty_attack
☆21Updated 2 months ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆38Updated last year
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆37Updated 5 months ago
tatsu-lab / test_set_contamination
☆41Updated 2 years ago
zhxieml / remiss-jailbreak
☆33Updated last year
deeplearning-wisc / picle
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
☆26Updated last year
OpenBMB / CPO
☆23Updated last year
SophieZheng998 / ALI-Agent
Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"
☆21Updated 3 months ago
ShuoTang123 / MATRIX
Implementation of the MATRIX framework (ICML 2024)
☆60Updated last year
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 7 months ago
Zayne-sprague / To-CoT-or-not-to-CoT
☆25Updated 7 months ago
Jiuzhouh / Uncertainty-Aware-Language-Agent
This is the official repo for Towards Uncertainty-Aware Language Agent.
☆29Updated last year
LiuAmber / RAHF
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆28Updated last year
rdi-berkeley / awesome-RLVR-boundary
A curated list of resources on Reinforcement Learning with Verifiable Rewards (RLVR) and the reasoning capability boundary of Large Langu…
☆76Updated 3 weeks ago
zjysteven / mink-plus-plus
[ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs
☆48Updated 5 months ago
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆44Updated 7 months ago
jinzhuoran / RWKU
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆86Updated last year
yihuaihong / ConceptVectors
[EMNLP 2025 Main] ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"
☆38Updated 2 months ago
Jometeorie / KnowledgeSpread
☆35Updated last year
uservan / ThinkPO
☆17Updated 3 months ago