aitsc / GLMKD
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method ; GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
â31Updated last year
Related projects â
Alternatives and complementary repositories for GLMKD
- â88Updated last month
- 𧏠RegMix: Data Mixture as Regression for Language Model Pre-trainingâ88Updated last month
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsâ146Updated 5 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)â139Updated 2 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialoguesâ51Updated 3 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Modelsâ73Updated 8 months ago
- An Experiment on Dynamic NTK Scaling RoPEâ61Updated 11 months ago
- Retrieval as Attentionâ83Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningâ125Updated 2 months ago
- â39Updated 7 months ago
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Modelsâ22Updated 3 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]â49Updated last week
- Towards Systematic Measurement for Long Text Qualityâ28Updated 2 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodingsâ147Updated 5 months ago
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.â125Updated last year
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"â119Updated 3 weeks ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.â47Updated last month
- Code for ACL 2023 paper titled "Lifting the Curse of Capacity Gap in Distilling Language Models"â28Updated last year
- â109Updated 4 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 睧çťé˘čŽçťćĺ âŚâ27Updated 3 months ago
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".â37Updated 2 weeks ago
- One Network, Many Masks: Towards More Parameter-Efficient Transfer Learningâ38Updated last year
- Long Context Extension and Generalization in LLMsâ39Updated 2 months ago
- Fantastic Data Engineering for Large Language Modelsâ51Updated 3 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.â47Updated 4 months ago
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).â97Updated 2 years ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modelingâ36Updated 8 months ago
- â66Updated 2 years ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"â28Updated 7 months ago
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignmentâ66Updated 5 months ago