Masked Structural Growth for 2x Faster Language Model Pre-training
☆25Apr 28, 2024Updated last year
Alternatives and similar repositories for MSG
Users that are interested in MSG are comparing it to the libraries listed below
Sorting:
- An Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales☆16Jun 6, 2024Updated last year
- ☆10Feb 3, 2025Updated last year
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 3 years ago
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆15Feb 12, 2026Updated 2 weeks ago
- Fork of Flame repo for training of some new stuff in development☆19Feb 20, 2026Updated last week
- Transmute AI Lab Model Efficiency Toolkit☆19Oct 2, 2023Updated 2 years ago
- Open Source Implementation of Dual Modality MAGVIT2 Tokenizer☆23Nov 26, 2024Updated last year
- Implementation of MixCE method described in ACL 2023 paper by Zhang et al.☆20May 29, 2023Updated 2 years ago
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 9 months ago
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆22Nov 26, 2022Updated 3 years ago
- ☆18Sep 5, 2024Updated last year
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆32Sep 22, 2024Updated last year
- ☆34Aug 23, 2023Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model☆23Nov 15, 2025Updated 3 months ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆33Jun 2, 2023Updated 2 years ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Jul 17, 2023Updated 2 years ago
- ☆43Oct 13, 2023Updated 2 years ago
- Continual Resilient (CoRe) Optimizer for PyTorch☆11Jun 10, 2024Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆51Feb 24, 2026Updated last week
- ☆31Mar 13, 2024Updated last year
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- LMTuner: Make the LLM Better for Everyone☆38Sep 21, 2023Updated 2 years ago
- An EXA-Scale repository of Multi-Modality AI resources from papers and models, to foundational libraries!☆40Feb 1, 2024Updated 2 years ago
- ☆42Apr 23, 2024Updated last year
- Source code for paper: Knowledge Inheritance for Pre-trained Language Models☆38Apr 24, 2022Updated 3 years ago
- Code for the paper "No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations"☆12Oct 31, 2024Updated last year
- ☆14Jan 10, 2025Updated last year
- Official GraphQLBlog repository. Add your blog posts as pull request!☆13Jan 11, 2023Updated 3 years ago
- A simple script to add pdf-files to Zotero via CLI☆12May 17, 2020Updated 5 years ago
- ☆11Sep 8, 2024Updated last year
- Moral Machine Experiment on LLMs☆11Feb 2, 2026Updated last month
- ☆12Jul 25, 2023Updated 2 years ago
- A simple agent powered by LLMs that performs tasks.☆13Apr 25, 2025Updated 10 months ago
- Improving transparency of large language models' reasoning☆14Nov 25, 2025Updated 3 months ago
- A drag-and-drop-enabled, responsive, envelope graph that allows to shape a wave with attack, decay, sustain and release☆11Jan 5, 2023Updated 3 years ago
- ☆16Jul 29, 2025Updated 7 months ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12May 24, 2023Updated 2 years ago
- Dataset and code to reproduce the results of the paper "Evolving Structures in Complex Systems"☆11Dec 16, 2019Updated 6 years ago