Luckfort / CD
"Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?"
☆58Updated last month
Related projects ⓘ
Alternatives and complementary repositories for CD
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆103Updated 6 months ago
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆60Updated 3 weeks ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆26Updated 2 weeks ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆82Updated 6 months ago
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆40Updated 3 weeks ago
- Official Code for paper "Towards Efficient and Effective Unlearning of Large Language Models for Recommendation" (Frontiers of Computer S…☆34Updated 4 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆48Updated 7 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆119Updated last month
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆84Updated 5 months ago
- Codebase for Instruction Following without Instruction Tuning☆32Updated last month
- ☆33Updated last year
- [ATTRIB @ NeurIPS 2024] When Attention Sink Emerges in Language Models: An Empirical View☆29Updated last month
- The official implementation of Self-Exploring Language Models (SELM)☆55Updated 5 months ago
- ☆90Updated 4 months ago
- [NeurIPS 2024] GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations☆50Updated 2 months ago
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆158Updated 4 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆76Updated 9 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆61Updated last week
- ☆46Updated 2 weeks ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆91Updated 4 months ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆52Updated last month
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆109Updated 3 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆29Updated last month
- [ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models☆68Updated 5 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆73Updated 3 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆24Updated 3 weeks ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆48Updated 3 months ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆64Updated 5 months ago
- ☆153Updated 11 months ago