THU-KEG / DICELinks
DICE: Detecting In-distribution Data Contamination with LLM's Internal State
☆11Updated last year
Alternatives and similar repositories for DICE
Users that are interested in DICE are comparing it to the libraries listed below
Sorting:
- Code for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context☆18Updated last year
- Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes☆20Updated 5 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆25Updated 11 months ago
- ☆23Updated last year
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆27Updated 11 months ago
- Code for the 2025 ACL publication "Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs"☆33Updated 5 months ago
- RapidIn: Scalable Influence Estimation for Large Language Models (LLMs). The implementation for paper "Token-wise Influential Training Da…☆20Updated 6 months ago
- ☆12Updated last year
- ☆50Updated last year
- [EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆19Updated last year
- ☆45Updated 8 months ago
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."☆17Updated 11 months ago
- [2025-TMLR] A Survey on the Honesty of Large Language Models☆63Updated 11 months ago
- Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]☆37Updated last year
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆68Updated 4 months ago
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Updated 4 months ago
- [EMNLP 2023] Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts☆28Updated 2 years ago
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆67Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Updated last year
- ☆25Updated 7 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 11 months ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆19Updated last year
- Public code repo for COLING 2025 paper "Aligning LLMs with Individual Preferences via Interaction"☆39Updated 7 months ago
- Lightweight Adapting for Black-Box Large Language Models☆24Updated last year
- [NeurIPS 2024] HonestLLM: Toward an Honest and Helpful Large Language Model☆29Updated 5 months ago
- BeHonest: Benchmarking Honesty in Large Language Models☆34Updated last year
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆30Updated last year
- ☆31Updated 9 months ago
- ☆21Updated last year
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year