songmzhang / DSKDView external linksLinks
Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same-tokenizer and cross-tokenizer LLM distillation.
☆61Aug 26, 2025Updated 5 months ago
Alternatives and similar repositories for DSKD
Users that are interested in DSKD are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆250Mar 13, 2025Updated 11 months ago
- Pytorch Implementation of "Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models", AAAI 2…☆38Feb 4, 2026Updated last week
- ☆31Mar 13, 2024Updated last year
- ☆15Apr 11, 2024Updated last year
- ☆24Oct 14, 2024Updated last year
- ☆22Oct 22, 2024Updated last year
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆53Jul 28, 2024Updated last year
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆20Feb 26, 2025Updated 11 months ago
- ☆23Nov 26, 2024Updated last year
- DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling☆36Jul 12, 2024Updated last year
- This is the second version of the practices for the rookies of BJTUNLPers.☆18Jan 13, 2021Updated 5 years ago
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆30Jan 28, 2026Updated 2 weeks ago
- Pytorch Implementation of "Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling", AAA…☆27Nov 25, 2025Updated 2 months ago
- [NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations☆19Jan 19, 2025Updated last year
- ☆14Jan 24, 2025Updated last year
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 9 months ago
- ☆16Sep 4, 2025Updated 5 months ago
- Pytorch code for paper QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models☆25Sep 27, 2023Updated 2 years ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆12Oct 10, 2020Updated 5 years ago
- [ICLR 2025] Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better☆16Feb 15, 2025Updated last year
- ☆10Feb 3, 2025Updated last year
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆14Feb 4, 2025Updated last year
- ☆30Jul 22, 2024Updated last year
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆35Jun 13, 2025Updated 8 months ago
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated last year
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- ☆12Jun 30, 2024Updated last year
- ☆15Nov 7, 2024Updated last year
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆90Sep 13, 2024Updated last year
- This repository is the implementation of the paper Training Free Pretrained Model Merging (CVPR2024).☆32Mar 5, 2024Updated last year
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated last year
- ☆46Sep 27, 2025Updated 4 months ago
- ☆73Dec 16, 2025Updated last month
- ☆91Dec 23, 2024Updated last year
- ☆21Dec 11, 2024Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 8 months ago
- [ICCV 2025] Official implementation of LLaVA-KD: A Framework of Distilling Multimodal Large Language Models☆125Oct 14, 2025Updated 4 months ago
- ☆15Sep 24, 2023Updated 2 years ago