nanowell / Q-Sparse-LLMLinks

My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

☆33

Alternatives and similar repositories for Q-Sparse-LLM

Users that are interested in Q-Sparse-LLM are comparing it to the libraries listed below

Sorting:

wdlctc / mini-s
☆52Updated last year
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆46Updated 11 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆100Updated last year
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated 3 weeks ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆61Updated last year
OpenMOSE / RWKV-Infer
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆45Updated last month
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆101Updated last year
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 6 months ago
howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆51Updated 4 months ago
Zyphra / Zyda_processing
☆39Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 11 months ago
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆35Updated 8 months ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
Infini-AI-Lab / gsm_infinite
☆55Updated 5 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
IST-DASLab / QuEST
Work in progress.
☆75Updated 4 months ago
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆39Updated last year
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
thepowerfuldeez / OLMo
My fork os allen AI's OLMo for educational purposes.
☆30Updated 11 months ago
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
RobertCsordas / moeut
☆88Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
kyleliang919 / Super_Muon
☆65Updated 8 months ago
SmerkyG / gptcore
Fast modular code to create and train cutting edge LLMs
☆68Updated last year
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆57Updated last week
SalesforceAIResearch / GemFilter
☆85Updated last week
kyleliang919 / Online-Subspace-Descent
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆30Updated 4 months ago