VITA-Group / LiGOLinks
[ICLR 2023] "Learning to Grow Pretrained Models for Efficient Transformer Training" by Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David Cox, Zhangyang Wang, Yoon Kim
☆92Updated last year
Alternatives and similar repositories for LiGO
Users that are interested in LiGO are comparing it to the libraries listed below
Sorting:
- ☆139Updated 11 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆52Updated 2 years ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆169Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆118Updated last year
- ☆105Updated last year
- ☆147Updated 2 years ago
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆86Updated 3 weeks ago
- Code for paper "Patch-Level Training for Large Language Models"☆85Updated 7 months ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆124Updated last year
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆101Updated last year
- Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models☆142Updated 2 years ago
- Sparse Backpropagation for Mixture-of-Expert Training☆29Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Updated last year
- Official PyTorch implementation of DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs (ICML 2025 Oral)☆29Updated 2 weeks ago
- ☆183Updated last year
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆86Updated 9 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 6 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆91Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆165Updated last year
- [EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models☆79Updated last year
- ☆95Updated last year
- ☆127Updated last year
- Some preliminary explorations of Mamba's context scaling.☆214Updated last year
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆65Updated 3 years ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated 2 months ago
- ☆129Updated 2 years ago
- ☆56Updated 6 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆61Updated last year