pphuc25 / distil-cdLinks
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation
☆35Updated last year
Alternatives and similar repositories for distil-cd
Users that are interested in distil-cd are comparing it to the libraries listed below
Sorting:
- LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS☆40Updated 3 months ago
- ☆270Updated last year
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆176Updated 5 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆231Updated 6 months ago
- [ACL 2024] LangBridge: Multilingual Reasoning Without Multilingual Supervision☆92Updated 10 months ago
- VNHSGE: Vietnamese High School Graduation Examination Dataset for Large Language Models☆28Updated 2 years ago
- ☆122Updated 6 months ago
- [ACL 2024 Demo] SeaLLMs - Large Language Models for Southeast Asia☆172Updated last year
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆431Updated last year
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆118Updated last year
- Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback☆97Updated 2 years ago
- Large Language Models Can Self-Improve in Long-context Reasoning☆73Updated 9 months ago
- Prune transformer layers☆69Updated last year
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆107Updated 7 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆170Updated last year
- Training code for Baby-Llama, our submission to the strict-small track of the BabyLM challenge.☆83Updated last year
- The official evaluation suite and dynamic data release for MixEval.☆245Updated 10 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆39Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆148Updated last year
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆362Updated last year
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆83Updated 11 months ago
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆64Updated 4 months ago
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆177Updated last year
- ☆72Updated last year
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆139Updated 10 months ago
- DSIR large-scale data selection framework for language model training☆258Updated last year
- ☆190Updated last year
- ☆18Updated 9 months ago
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same…☆59Updated 3 weeks ago
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆187Updated last year