AlexanderVNikitin / kernel-language-entropyLinks

Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)

☆27

Alternatives and similar repositories for kernel-language-entropy

Users that are interested in kernel-language-entropy are comparing it to the libraries listed below

Sorting:

hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆61Updated last year
princeton-nlp / benign-data-breaks-safety
☆41Updated 10 months ago
deeplearning-wisc / haloscope
source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"
☆51Updated 3 months ago
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆39Updated 4 months ago
tmlr-group / NoisyRationales
[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"
☆35Updated 2 weeks ago
ykwon0407 / DataInf
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)
☆73Updated 10 months ago
yaojin17 / Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"
☆59Updated 10 months ago
jinhaoduan / SAR
[ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
☆53Updated 11 months ago
swj0419 / muse_bench
☆23Updated 4 months ago
deeplearning-wisc / picle
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
☆25Updated last year
licong-lin / negative-preference-optimization
☆60Updated last year
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆81Updated 4 months ago
jinzhuoran / RWKU
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆77Updated 10 months ago
EnnengYang / AdaMerging
AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.
☆88Updated 9 months ago
OPTML-Group / SOUL
Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"
☆26Updated 10 months ago
ZFancy / awesome-activation-engineering
A curated list of resources for activation engineering
☆99Updated 2 months ago
MiaoXiong2320 / llm-uncertainty
code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"
☆125Updated last year
git-disl / Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆45Updated 8 months ago
uw-nsl / safechain
[ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
☆18Updated 4 months ago
LiuAmber / RAHF
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆25Updated 10 months ago
TrustGen / TrustEval-toolkit
Toolkit for evaluating the trustworthiness of generative foundation models.
☆107Updated this week
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆39Updated 6 months ago
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆143Updated 3 months ago
ybwang119 / Awesome-reasoning-safety
This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL
☆24Updated last week
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆39Updated 8 months ago
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆35Updated 2 months ago
jaechan-repo / muse_bench
☆26Updated 11 months ago
nik-dim / tall_masks
Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]
☆47Updated 9 months ago
javiferran / sae_entities
☆60Updated 5 months ago
KID-22 / LLM-Unlearning-Paper-List
☆28Updated last year