AlexanderVNikitin / kernel-language-entropy
☆10Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for kernel-language-entropy
- ☆21Updated last month
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆47Updated last month
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆53Updated last month
- Lightweight Adapting for Black-Box Large Language Models☆18Updated 9 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆26Updated 2 weeks ago
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆36Updated 3 weeks ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆45Updated 7 months ago
- code repo for ICLR 2024 paper "Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs"☆74Updated 8 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆21Updated 4 months ago
- ☆35Updated 4 months ago
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆60Updated last month
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆14Updated 3 weeks ago
- ☆38Updated last year
- ☆26Updated 9 months ago
- [ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models☆37Updated 2 months ago
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆19Updated this week
- The official repo of paper "Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller"☆18Updated 3 months ago
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆36Updated this week
- Codebase for reproducing the experiments of the semantic uncertainty paper (paragraph-length experiments).☆43Updated 7 months ago
- An Open Source Implementation of Anthropic's Paper: "Towards Monosemanticity: Decomposing Language Models with Dictionary Learning"☆29Updated 6 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆28Updated 4 months ago
- Discover and Cure: Concept-aware Mitigation of Spurious Correlation (ICML 2023)☆40Updated 7 months ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆67Updated 11 months ago
- ☆28Updated 5 months ago
- ☆16Updated 4 months ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆62Updated last month
- ☆12Updated last month
- Code for "Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models", ICLR 2024 Oral.☆20Updated 7 months ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆33Updated last week