aitsc / GLMKD
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method ; GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
☆31Updated last year
Related projects ⓘ
Alternatives and complementary repositories for GLMKD
- ☆88Updated last month
- 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training☆87Updated last month
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆141Updated 4 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆143Updated 4 months ago
- Towards Systematic Measurement for Long Text Quality☆28Updated 2 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models☆48Updated 2 months ago
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.☆125Updated last year
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆134Updated last month
- Retrieval as Attention☆83Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆72Updated 8 months ago
- ☆37Updated 7 months ago
- Code for the paper "A Theoretical Analysis of the Repetition Problem in Text Generation" in AAAI 2021.☆51Updated 2 years ago
- The code of paper "Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation" published at NeurIPS 202…☆40Updated 2 years ago
- Code for ACL 2023 paper titled "Lifting the Curse of Capacity Gap in Distilling Language Models"☆28Updated last year
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆22Updated 3 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆46Updated 4 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆36Updated 8 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆113Updated last week
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆123Updated 2 months ago
- Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆22Updated 3 months ago
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆33Updated 11 months ago
- ☆78Updated 2 years ago
- ☆59Updated last year
- An Experiment on Dynamic NTK Scaling RoPE☆61Updated 11 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆46Updated 3 months ago
- ☆26Updated last year
- [AAAI 2024] Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following☆79Updated last month
- ☆101Updated last year
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆42Updated 2 years ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆46Updated last month