Masked Structural Growth for 2x Faster Language Model Pre-training
☆25Apr 28, 2024Updated 2 years ago
Alternatives and similar repositories for MSG
Users that are interested in MSG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales☆16Jun 6, 2024Updated 2 years ago
- Staged Training for Transformer Language Models☆33Mar 31, 2022Updated 4 years ago
- ☆18Sep 5, 2024Updated last year
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆32Sep 22, 2024Updated last year
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Jul 17, 2023Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [Poster; ICLR 2026] [Oral; Neurips OPT2024] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆16Apr 15, 2026Updated 2 months ago
- Implementation of MixCE method described in ACL 2023 paper by Zhang et al.☆20May 29, 2023Updated 3 years ago
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆22Nov 26, 2022Updated 3 years ago
- Code for the paper "Function-Space Learning Rates"☆23Jun 3, 2025Updated last year
- Official repository for the paper "Exploring the Promise and Limits of Real-Time Recurrent Learning" (ICLR 2024)☆13Jun 11, 2025Updated last year
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆33Jun 2, 2023Updated 3 years ago
- [ICLR 2023] "Learning to Grow Pretrained Models for Efficient Transformer Training" by Peihao Wang, Rameswar Panda, Lucas Torroba Hennige…☆92Feb 26, 2024Updated 2 years ago
- Official Implementation (Pytorch) of the "Representation Shift: Unifying Token Compression with FlashAttention", ICCV 2025☆35Feb 22, 2026Updated 4 months ago
- Log-Polar Space Convolution for Convolutional Neural Networks☆13Dec 12, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Fork of Flame repo for training of some new stuff in development☆19Jun 23, 2026Updated last week
- 日本語の文章からAI臭を取り除く Claude Skill☆265Jun 11, 2026Updated 3 weeks ago
- ☆19Jan 2, 2024Updated 2 years ago
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆36Jan 18, 2025Updated last year
- Offical Repo for Splitting Steepest Descent for Growing Neural Architectures☆13May 12, 2021Updated 5 years ago
- Block-Recurrent Dynamics in ViTs 🦖☆46May 21, 2026Updated last month
- Images of example pages from Transkribus model training sets to make it easier to find a match.☆16Jan 25, 2022Updated 4 years ago
- NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning☆29Jul 28, 2024Updated last year
- Generic build server☆65May 25, 2014Updated 12 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆36Aug 23, 2023Updated 2 years ago
- ☆14Nov 21, 2017Updated 8 years ago
- Code for AdaXpert (ICML'21)☆16Jul 19, 2021Updated 4 years ago
- An experiment to see if chatgpt can improve the output of the stanford alpaca dataset☆12Mar 29, 2023Updated 3 years ago
- A sample app to debug and validate cellular modems on balena devices☆13Jun 5, 2019Updated 7 years ago
- MongoEngine flask extension with WTF model forms support☆14Nov 18, 2025Updated 7 months ago
- ☆13May 17, 2025Updated last year
- Code for "Merging Text Transformers from Different Initializations"☆20Feb 2, 2025Updated last year
- Docker for everyday deep learning research on a remote server. (Tensorflow & Pytorch / Jax + VNC)☆25Jan 13, 2026Updated 5 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- codes for ICML2021 paper iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients☆10May 27, 2021Updated 5 years ago
- Economics diagrams in Ti𝑘Z☆19Nov 2, 2019Updated 6 years ago
- Codebase for Extracting Reward Functions from Diffusion Models☆16Dec 7, 2023Updated 2 years ago
- React 0.13 with ES6, Immutable.js and Flux, Isomorphic as well☆11Mar 10, 2015Updated 11 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Jun 26, 2026Updated last week
- PyTorch implementation of Language model compression with weighted low-rank factorization☆14Jun 28, 2023Updated 3 years ago
- ☆14Dec 13, 2018Updated 7 years ago