yuezhouhu / 2by4-pretrainView external linksLinks
Efficient 2:4 sparse training algorithms and implementations
☆59Dec 8, 2024Updated last year
Alternatives and similar repositories for 2by4-pretrain
Users that are interested in 2by4-pretrain are comparing it to the libraries listed below
Sorting:
- Code for "Accelerating Transformer Pre-training with 2:4 Sparsity"☆27Dec 8, 2024Updated last year
- Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)☆18Jul 1, 2025Updated 7 months ago
- ☆61Jul 21, 2024Updated last year
- Official implementation of Self-Remixing☆17Feb 3, 2024Updated 2 years ago
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- SporeAgent: Reinforced Scene-level Plausibility for Object Pose Refinement☆11May 23, 2024Updated last year
- ☆11Dec 26, 2025Updated last month
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆106Dec 20, 2024Updated last year
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAM☆17Nov 7, 2024Updated last year
- ☆244Nov 9, 2022Updated 3 years ago
- Official repo for AAAI 2023 paper "Stable Learning via Sparse Variable Independence".☆13Jun 6, 2024Updated last year
- ☆158Feb 15, 2025Updated last year
- A family of efficient edge language models in 100M~1B sizes.☆19Feb 14, 2025Updated last year
- Easily turn large sets of audio urls to an audio dataset.☆21Dec 27, 2022Updated 3 years ago
- ☆19Dec 31, 2025Updated last month
- ☆15Apr 26, 2022Updated 3 years ago
- Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.☆483Nov 26, 2024Updated last year
- 清华大学研究生社会实践系统爬虫☆17Jun 4, 2024Updated last year
- HomebrewNLP in JAX flavour for maintable TPU-Training☆51Jan 20, 2024Updated 2 years ago
- The loss landscape of Large Language Models resemble basin!☆36Jul 8, 2025Updated 7 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 2 years ago
- 16-fold memory access reduction with nearly no loss☆110Mar 26, 2025Updated 10 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆73Updated this week
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆187Jan 1, 2025Updated last year
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Jul 30, 2020Updated 5 years ago
- code for the paper "A Statistical Framework for Low-bitwidth Training of Deep Neural Networks"☆29Oct 31, 2020Updated 5 years ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆128Jul 13, 2024Updated last year
- Minimal implementation of PCA in PyTorch, tested against scikit-learn's implementation☆29Feb 24, 2025Updated 11 months ago
- GPU operators for sparse tensor operations☆35Mar 11, 2024Updated last year
- gzip Predicts Data-dependent Scaling Laws☆34May 28, 2024Updated last year
- ☆36Jan 6, 2026Updated last month
- Low-bit optimizers for PyTorch☆138Oct 9, 2023Updated 2 years ago
- Converts stable diffusion embeddings to loadable pngs☆40Dec 6, 2022Updated 3 years ago
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Code for ICML 2021 submission☆35Mar 24, 2021Updated 4 years ago
- ☆54Dec 17, 2025Updated last month
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆54Jan 12, 2026Updated last month
- Code and Model for NeurIPS 2024 Spotlight Paper "Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training…☆44Oct 16, 2024Updated last year