aitsc / GLMKDLinks

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method ; GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

☆32

Alternatives and similar repositories for GLMKD

Users that are interested in GLMKD are comparing it to the libraries listed below

Sorting:

yegcjs / mixinglaws
☆104Updated 3 weeks ago
princeton-nlp / CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
☆157Updated last year
sail-sg / regmix
[ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)
☆157Updated 5 months ago
princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
☆184Updated last year
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆166Updated last month
DAMO-NLP-SG / CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Updated last year
SimiaoZuo / MoEBERT
This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).
☆109Updated 3 years ago
swj0419 / in-context-pretraining
☆53Updated last year
NormXU / Consistent-DynamicNTKRoPE
An Experiment on Dynamic NTK Scaling RoPE
☆64Updated last year
QwenLM / online_merging_optimizers
Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
☆75Updated last year
FranxYao / FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆131Updated 2 years ago
swtheing / PF-PPO-RLHF
☆33Updated 10 months ago
AI21Labs / Parallel-Context-Windows
☆104Updated 2 years ago
TsinghuaC3I / Intuitive-Fine-Tuning
[ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process
☆28Updated last year
thunlp / MoEfication
☆139Updated last year
princeton-pli / MeCo
Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"
☆41Updated last month
hemingkx / SpecDec
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
☆42Updated last year
OpenMOSS / Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆81Updated last year
OpenLMLab / LongWanjuan
Towards Systematic Measurement for Long Text Quality
☆37Updated 11 months ago
ZitongYang / Synthetic_Continued_Pretraining
Code implementation of synthetic continued pretraining
☆123Updated 7 months ago
cxcscmu / MATES
Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]
☆73Updated 8 months ago
Jxu-Thu / DITTO
The code of paper "Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation" published at NeurIPS 202…
☆46Updated 2 years ago
mtbench101 / mt-bench-101
[ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
☆105Updated last year
nick7nlp / Counting-Stars
Counting-Stars (★)
☆83Updated 2 months ago
yyDing1 / ScaleQuest
[ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆63Updated 9 months ago
ChaosCodes / ProPETL
One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning
☆40Updated 2 years ago
October2001 / ProLong
[ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models
☆56Updated last year
YJiangcm / FollowBench
[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
☆108Updated last month
FreedomIntelligence / OVM
☆68Updated last year
zhiyuanhubj / LongRecipe
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models
☆76Updated 9 months ago