yuezhouhu/2by4-pretrain

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yuezhouhu/2by4-pretrain)

yuezhouhu / 2by4-pretrain

Efficient 2:4 sparse training algorithms and implementations

☆62

Alternatives and similar repositories for 2by4-pretrain

Users that are interested in 2by4-pretrain are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thu-ml / 2by4-pretrain-acc-examples
View on GitHub
Code for "Accelerating Transformer Pre-training with 2:4 Sparsity"
☆28Dec 8, 2024Updated last year
yuezhouhu / residual-context-diffusion
View on GitHub
[ICML 2026] Residual Context Diffusion (RCD): Repurposing discarded signals as structured priors for high-performance reasoning in dLLMs.
☆59Jun 28, 2026Updated last month
thu-ml / Adaptive-Sparse-Trainer
View on GitHub
Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)
☆19Jul 1, 2025Updated last year
yuezhouhu / adaspec
View on GitHub
A selective knowledge distillation algorithm for efficient speculative decoders
☆39Nov 27, 2025Updated 8 months ago
hyhuang00 / moe_inference
View on GitHub
Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".
☆19Oct 30, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
aojunzz / NM-sparsity
View on GitHub
☆245Nov 9, 2022Updated 3 years ago
nikhilvyas / SOAP_MUON
View on GitHub
Combining SOAP and MUON
☆25Feb 11, 2025Updated last year
FasterDecoding / TEAL
View on GitHub
☆168Feb 15, 2025Updated last year
thu-ml / TetraJet-v2-NVFP4Training
View on GitHub
[ICML 2026 Spotlight] Official implementation of TetraJet-v2: Accurate NVFP4 Training for LLMs, with fully-NVFP4 linear layer with unbias…
☆16Jul 3, 2026Updated 3 weeks ago
Adlik / model_zoo
View on GitHub
☆11Dec 26, 2025Updated 7 months ago
goldblum / free-lunch
View on GitHub
Implementation of experiments from The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
☆17May 14, 2023Updated 3 years ago
JL-Cheng / SERE
View on GitHub
[ICLR 2026] SERE: Similarity-Based Expert Re-routing for Efficient Batch Decoding in MoE Models
☆18Feb 4, 2026Updated 5 months ago
NVlabs / COAT
View on GitHub
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆263Aug 9, 2025Updated 11 months ago
genuszty / FastPCI
View on GitHub
☆14Jul 24, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
papers-submission / structured_transposable_masks
View on GitHub
Code for ICML 2021 submission
☆35Mar 24, 2021Updated 5 years ago
HankYe / Once-for-Both
View on GitHub
[CVPR'24] Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
☆16Jul 1, 2024Updated 2 years ago
haochengxi / Train_Transformers_with_INT4
View on GitHub
☆157Jun 22, 2023Updated 3 years ago
THUDM / APAR
View on GitHub
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
☆14Jul 22, 2024Updated 2 years ago
Su-my / TRAPO
View on GitHub
The official repository for Trust-Region Adaptive Policy Optimization (TRAPO) – a novel hybrid framework designed to enhance large langua…
☆16Mar 2, 2026Updated 4 months ago
JiaweiXu8 / Grid4D
View on GitHub
[NeurIPS 2024] Official implementation of "Grid4D: 4D Decomposed Hash Encoding for High-Fidelity Dynamic Gaussian Splatting"
☆94Jul 27, 2025Updated last year
VITA-Group / Random-MoE-as-Dropout
View on GitHub
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆56Feb 28, 2023Updated 3 years ago
Harry-Chen / thshijian-crawler
View on GitHub
清华大学研究生社会实践系统爬虫
☆17Jun 4, 2024Updated 2 years ago
spcl / QuaRot
View on GitHub
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
☆526Nov 26, 2024Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
hyoseok1223 / Product-of-Experts-GAN
View on GitHub
PyTorch unoffical implementation of "PoE-GAN : Multimodal Conditional Image Synthesis with Product-of-Experts GANs"
☆15Mar 29, 2023Updated 3 years ago
thunlp / BlockFFN
View on GitHub
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
☆19Jan 10, 2026Updated 6 months ago
wangsiping97 / FastGEMV
View on GitHub
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆129Jul 13, 2024Updated 2 years ago
thu-ml / TetraJet-MXFP4Training
View on GitHub
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆40May 4, 2026Updated 2 months ago
GageHowe / wat
View on GitHub
a tool like just and entr. A minimal cross-platform, config-driven tool for running commands whenever files change
☆15Jun 4, 2026Updated last month
Dao-AILab / sonic-moe
View on GitHub
Accelerating MoE with IO and Tile-aware Optimizations
☆733Jul 4, 2026Updated 3 weeks ago
dmis-lab / Outlier-Safe-Pre-Training
View on GitHub
[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
☆39Nov 4, 2025Updated 8 months ago
hustzxd / EfficientPyTorch
View on GitHub
A PyTorch Framework for Efficient Pruning and Quantization for specialized accelerators.
☆37Dec 5, 2021Updated 4 years ago
jxzhn / supply-chain
View on GitHub
基于FISCO-BCOS区块链的供应链demo，使用node.js构建后端
☆10Jan 28, 2021Updated 5 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
pingxue-hfut / sd-bnn
View on GitHub
Self-Distribution BNN
☆10Mar 8, 2022Updated 4 years ago
huanranchen / LLMLandscape
View on GitHub
The loss landscape of Large Language Models resemble basin!
☆41Jul 8, 2025Updated last year
thkkk / GANSketching_Jittor
View on GitHub
Implementation of Sketch Your Own GAN in Jittor
☆10Jan 2, 2022Updated 4 years ago
elliothe / Ternarized_Neural_Network
View on GitHub
Optimizing Deep Convolutional Neural Network with Ternarized Weights and High Accuracy
☆16Jan 27, 2019Updated 7 years ago
kriskrisliu / PAT
View on GitHub
[AAAI 2025] PAT: Pruning-Aware Tuning for Large Language Models
☆37Feb 1, 2025Updated last year
tongxuluo / prts
View on GitHub
Code and Model for NeurIPS 2024 Spotlight Paper "Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training…
☆44Oct 16, 2024Updated last year
haizhongzheng / LTE
View on GitHub
☆13Oct 13, 2025Updated 9 months ago