thu-coai / MiniPLMLinks

[ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models

☆60

Alternatives and similar repositories for MiniPLM

Users that are interested in MiniPLM are comparing it to the libraries listed below

Sorting:

VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆55Updated 2 years ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆86Updated last year
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆101Updated 2 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆60Updated last year
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆88Updated 10 months ago
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 8 months ago
DAMO-NLP-SG / LongPO
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆40Updated 7 months ago
Shwai-He / MEO
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆38Updated last year
DAMO-NLP-SG / CLEX
[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models
☆78Updated last year
SalesforceAIResearch / GemFilter
☆86Updated 8 months ago
locuslab / scaling_laws_data_filtering
☆65Updated last year
songmzhang / DSKD
Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same…
☆60Updated last month
Infini-AI-Lab / S2FT
☆19Updated 9 months ago
yegcjs / DiffusionLLM
Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"
☆83Updated last year
leezythu / FocusLLM
FocusLLM: Scaling LLM’s Context by Parallel Decoding
☆43Updated 10 months ago
snu-mllab / Context-Memory
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆61Updated last year
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
CodeCreator / WebOrganizer
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
☆66Updated 5 months ago
Infini-AI-Lab / gsm_infinite
☆54Updated 3 months ago
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated last year
qiuzh20 / gated_attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆83Updated 3 weeks ago
song-wx / SIFT
[ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely
☆22Updated last year
abdelfattah-lab / TokenButler
☆25Updated 2 months ago
raymin0223 / fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
☆63Updated last year
VITA-Group / Ms-PoE
"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…
☆30Updated last year
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆123Updated 8 months ago
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆44Updated last year
bigai-nlco / TokenSwift
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
☆113Updated 4 months ago
ShiZhengyan / InstructionModelling
[NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"
☆39Updated last year