thunlp/SparsingLaw

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thunlp/SparsingLaw)

thunlp / SparsingLaw

The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".

☆32

Alternatives and similar repositories for SparsingLaw

Users that are interested in SparsingLaw are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thunlp / Modularity-Analysis
View on GitHub
[ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers
☆26Jun 7, 2023Updated 3 years ago
Zcchill / Value-Residual-Learning
View on GitHub
☆15Mar 20, 2025Updated last year
SJTU-DENG-Lab / AdaMoE
View on GitHub
[Findings of EMNLP 2024] AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
☆20Oct 2, 2024Updated last year
PKU-ML / LongPPL
View on GitHub
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆115Oct 11, 2025Updated 9 months ago
wmn-231314 / diffusion-data-constraint
View on GitHub
Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…
☆127Jan 10, 2026Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ZihaoHuang-notabot / Ultra-Sparse-Memory-Network
View on GitHub
☆48Jul 3, 2026Updated 3 weeks ago
shangshang-wang / Resa
View on GitHub
Resa: Transparent Reasoning Models via SAEs
☆50Sep 23, 2025Updated 10 months ago
SDLAML / disco
View on GitHub
☆16Dec 11, 2025Updated 7 months ago
RobertCsordas / switchhead
View on GitHub
☆16Jun 11, 2025Updated last year
yaof20 / DenseMixer
View on GitHub
Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient
☆68Aug 3, 2025Updated 11 months ago
thunlp / APB
View on GitHub
Official Implementation of APB (ACL 2025 main Oral) and Spava (ACL 2026 main).
☆37Apr 6, 2026Updated 3 months ago
zaydzuhri / flame
View on GitHub
Fork of Flame repo for training of some new stuff in development
☆20Jul 15, 2026Updated last week
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
Leey21 / CipherBank
View on GitHub
☆14Jun 13, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
MeganTj / multimodal_alignment
View on GitHub
☆22Jun 4, 2025Updated last year
ag1988 / top_k_attention
View on GitHub
The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…
☆70Sep 19, 2021Updated 4 years ago
sail-sg / SkyLadder
View on GitHub
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆43Dec 29, 2025Updated 6 months ago
AngelaZZZ-611 / reasoning_models_probing
View on GitHub
☆21May 14, 2026Updated 2 months ago
belindal / state-tracking
View on GitHub
Code and data for paper "(How) do Language Models Track State?"
☆26Mar 31, 2025Updated last year
thu-ml / ReMoE
View on GitHub
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆118Dec 20, 2024Updated last year
assafbk / OPRM
View on GitHub
Overflow Prevention Enhances Long-Context Recurrent LLMs (COLM 2025)
☆18Jul 8, 2025Updated last year
jailflip / jailflip-2025
View on GitHub
☆22Jan 9, 2026Updated 6 months ago
AI9Stars / AutoTriton
View on GitHub
☆66Jul 14, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
uservan / speculative_thinking
View on GitHub
☆34Oct 13, 2025Updated 9 months ago
Raincleared-Song / DejaVu_predictor
View on GitHub
The codes for training sparsity predictor on LLaMA.
☆18May 12, 2024Updated 2 years ago
ozyyshr / RAST
View on GitHub
Reasoning Activation in LLMs via Small Model Transfer (NeurIPS 2025)
☆22Oct 16, 2025Updated 9 months ago
pixeli99 / MixLN
View on GitHub
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆30Jul 24, 2025Updated last year
ParCIS / FlashSparse
View on GitHub
FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swa…
☆39Oct 5, 2025Updated 9 months ago
wenquanlu / huginn-latent-cot
View on GitHub
[COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…
☆20Oct 4, 2025Updated 9 months ago
pzs19 / TokenSelect
View on GitHub
☆20Mar 11, 2025Updated last year
RobertCsordas / moeut
View on GitHub
☆93Aug 18, 2024Updated last year
Longin-Yu / ComRoPE
View on GitHub
☆11Jun 11, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AkideLiu / MiniCache
View on GitHub
☆14Sep 7, 2024Updated last year
akhilkedia / TranformersGetStable
View on GitHub
[ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"
☆11Jul 19, 2024Updated 2 years ago
Kwai-Klear / RLEP
View on GitHub
RL with Experience Replay
☆59Jul 27, 2025Updated 11 months ago
OpenSparseLLMs / MoM
View on GitHub
☆139Feb 4, 2026Updated 5 months ago
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Sep 4, 2025Updated 10 months ago
facebookresearch / zero
View on GitHub
PyTorch Implementation of Zero-Shot Vision Encoder Grafting via LLM Surrogates [ICCV'25]
☆54Jul 10, 2025Updated last year
Farseer-Scaling-Law / Farseer
View on GitHub
☆21Jun 12, 2025Updated last year