Luckfort / CD
[COLING'25] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
☆61Updated last month
Alternatives and similar repositories for CD:
Users that are interested in CD are comparing it to the libraries listed below
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆52Updated 2 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆106Updated 8 months ago
- 🌾 OAT: Online AlignmenT for LLMs☆81Updated 3 weeks ago
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆67Updated 2 months ago
- ☆36Updated last year
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆40Updated last month
- ☆39Updated last year
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆88Updated 3 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆59Updated 7 months ago
- Official Code for paper "Towards Efficient and Effective Unlearning of Large Language Models for Recommendation" (Frontiers of Computer S…☆34Updated 5 months ago
- Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning☆39Updated 2 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆92Updated 8 months ago
- AnchorAttention: Improved attention for LLMs long-context training☆202Updated this week
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆31Updated 2 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆42Updated last month
- ☆23Updated last month
- ☆52Updated 2 weeks ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆38Updated 2 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆50Updated 9 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆88Updated 7 months ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆117Updated 5 months ago
- ☆102Updated 3 weeks ago
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆44Updated last month
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆53Updated this week
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆53Updated 3 months ago
- ☆56Updated 4 months ago
- [ATTRIB @ NeurIPS 2024 Oral] When Attention Sink Emerges in Language Models: An Empirical View☆43Updated 3 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆131Updated 3 months ago
- SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights☆45Updated 3 months ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆52Updated 8 months ago