thu-ml/2by4-pretrain-acc-examples

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thu-ml/2by4-pretrain-acc-examples)

thu-ml / 2by4-pretrain-acc-examples

Code for "Accelerating Transformer Pre-training with 2:4 Sparsity"

☆28

Alternatives and similar repositories for 2by4-pretrain-acc-examples

Users that are interested in 2by4-pretrain-acc-examples are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yuezhouhu / 2by4-pretrain
View on GitHub
Efficient 2:4 sparse training algorithms and implementations
☆62Dec 8, 2024Updated last year
yuezhouhu / residual-context-diffusion
View on GitHub
[ICML 2026] Residual Context Diffusion (RCD): Repurposing discarded signals as structured priors for high-performance reasoning in dLLMs.
☆59Jun 28, 2026Updated last month
yuezhouhu / adaspec
View on GitHub
A selective knowledge distillation algorithm for efficient speculative decoders
☆39Nov 27, 2025Updated 8 months ago
thu-ml / Adaptive-Sparse-Trainer
View on GitHub
Official implementation for "Pruning Large Language Models with Semi-Structural Adaptive Sparse Training" (AAAI 2025)
☆19Jul 1, 2025Updated last year
thu-ml / TetraJet-v2-NVFP4Training
View on GitHub
[ICML 2026 Spotlight] Official implementation of TetraJet-v2: Accurate NVFP4 Training for LLMs, with fully-NVFP4 linear layer with unbias…
☆16Jul 3, 2026Updated 3 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Adlik / model_zoo
View on GitHub
☆11Dec 26, 2025Updated 7 months ago
Su-my / TRAPO
View on GitHub
The official repository for Trust-Region Adaptive Policy Optimization (TRAPO) – a novel hybrid framework designed to enhance large langua…
☆16Mar 2, 2026Updated 4 months ago
aojunzz / NM-sparsity
View on GitHub
☆245Nov 9, 2022Updated 3 years ago
jxzhn / supply-chain
View on GitHub
基于FISCO-BCOS区块链的供应链demo，使用node.js构建后端
☆10Jan 28, 2021Updated 5 years ago
sast-summer-training-2023 / sast-summer-training-2023.github.io
View on GitHub
Summer Training 2023, SAST 9.
☆42Aug 15, 2023Updated 2 years ago
hkust-nlp / dart-math
View on GitHub
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆120Dec 10, 2024Updated last year
hotwords123 / oi-code-collector
View on GitHub
简易 OI 交题服务器
☆11Dec 12, 2025Updated 7 months ago
abhibambhaniya / progressive_gradient_flow_nm_sparsity
View on GitHub
Implementation of NM sparsity recipe presented in the paper "Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers".
☆11Feb 5, 2024Updated 2 years ago
Mixture-AI / Mixture-of-Depths
View on GitHub
Google DeepMind: Mixture of Depths Unofficial Implementation.
☆12May 29, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
KrishnaswamyLab / LM-Dispersion
View on GitHub
[𝗜𝗖𝗠𝗟 𝟮𝟬𝟮𝟲] Dispersion loss counteracts embedding condensation and improves generalization in small language models
☆18May 21, 2026Updated 2 months ago
papers-submission / structured_transposable_masks
View on GitHub
Code for ICML 2021 submission
☆35Mar 24, 2021Updated 5 years ago
Lightning-Universe / lightning-Hivemind
View on GitHub
Lightning Training strategy for HiveMind
☆18Jan 20, 2026Updated 6 months ago
Ther-nullptr / circult-eda-mlsys-tinyml-arxiv-daily
View on GitHub
🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)
☆10Updated this week
HannesHaglund / Swiss
View on GitHub
Swiss-system tournament manager in python 3.
☆11Aug 13, 2021Updated 4 years ago
istoony / winograd-convolutional-nn
View on GitHub
I'm going to use the Winograd’s minimal ﬁltering algorithms to introduce a new class of fast algorithms for convolutional neural networks…
☆12Mar 22, 2018Updated 8 years ago
xunzhang1128 / Q-DiT4SR
View on GitHub
[ICML 2026] Q-DiT4SR: Exploration of Detail-Preserving Diffusion Transformer Quantization for Real-World Image Super-Resolution
☆19May 1, 2026Updated 2 months ago
macloo / flask-forms
View on GitHub
Very basic Flask application with an interactive form, using Flask-WTF and Flask-Bootstrap
☆11Mar 26, 2017Updated 9 years ago
stepbuystep / LightNAS
View on GitHub
You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms
☆12Apr 17, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
hysts / pytorch_yolov3
View on GitHub
A PyTorch Implementation of YOLOv3
☆14Apr 16, 2019Updated 7 years ago
dropbox / low-rank-llama2
View on GitHub
Low-Rank Llama Custom Training
☆23Mar 27, 2024Updated 2 years ago
Ucas-HaoranWei / Aircraft-KP
View on GitHub
Keypoint dataset for airplane
☆10Dec 28, 2019Updated 6 years ago
JingzheShi / GaiLunGPT
View on GitHub
使用GPT协助完成"...概论", "...原理"等大学课程作业的方式汇总。An Aggregation of ways to use GPT to help with (noncritical) parts of homework assignments in some …
☆19Sep 7, 2023Updated 2 years ago
thunlp / BlockFFN
View on GitHub
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
☆19Jan 10, 2026Updated 6 months ago
AHSFNU / syzoj
View on GitHub
一个用于算法竞赛的在线评测系统。An online judge system for algorithm competition.
☆15Jun 24, 2023Updated 3 years ago
tenstorrent / tt-buda-benchmarks
View on GitHub
Repository for AI model benchmarking on TT-Buda
☆16Feb 9, 2026Updated 5 months ago
tongxuluo / prts
View on GitHub
Code and Model for NeurIPS 2024 Spotlight Paper "Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training…
☆44Oct 16, 2024Updated last year
pprp / STBLLM
View on GitHub
[ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
☆20Jun 3, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
haochengxi / Train_Transformers_with_INT4
View on GitHub
☆157Jun 22, 2023Updated 3 years ago
tenstorrent / sfpi
View on GitHub
☆14Jul 17, 2026Updated last week
dasabir / RAiD_Dataset
View on GitHub
Re-Identification Across Indoor-Outdoor Dataset (RAiD) - Introduced in the work "Consistent Re-identification in a Camera Network" (ECCV …
☆16Nov 26, 2014Updated 11 years ago
peytontolbert / simple-moe
View on GitHub
Simple MoE - Day 17 of 365 Days of Repos
☆20Jun 2, 2026Updated last month
byeongjun-park / DTR
View on GitHub
[ICLR 2024] Official pytorch implementation of "Denoising Task Routing for Diffusion Models"
☆25Feb 19, 2024Updated 2 years ago
NUS-HPC-AI-Lab / Dynamic-Diffusion-Transformer
View on GitHub
☆96Mar 26, 2025Updated last year
linhongseba / MaximumClique
View on GitHub
The implementation for maximum clique enumeration algorithm
☆11Apr 14, 2016Updated 10 years ago