recursal / RADLADS-paperLinks

RADLADS training code

☆34

Alternatives and similar repositories for RADLADS-paper

Users that are interested in RADLADS-paper are comparing it to the libraries listed below

Sorting:

howard-hou / RWKV-X
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…
☆51Updated 4 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
OpenMOSE / RWKV-Infer
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆45Updated last month
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆74Updated 9 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
johanwind / wind_rwkv
☆27Updated 4 months ago
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆38Updated 7 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆212Updated 5 months ago
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆101Updated last year
00ffcc / chunkRWKV6
continous batching and parallel acceleration for RWKV6
☆22Updated last year
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆106Updated last month
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆91Updated 4 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 5 months ago
Joluck / MiSS
MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…
☆25Updated last month
dame-cell / Triformer
Transformers components but in Triton
☆34Updated 6 months ago
SmerkyG / gptcore
Fast modular code to create and train cutting edge LLMs
☆68Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆73Updated last year
SmerkyG / RWKV_Explained
RWKV, in easy to read code
☆72Updated 8 months ago
insuhan / hyper-attn
☆83Updated 2 years ago
wdlctc / mini-s
☆53Updated last year
kyleliang919 / Super_Muon
☆66Updated 8 months ago
berlino / seq_icl
☆53Updated last year
HanGuo97 / log-linear-attention
☆256Updated 6 months ago
test-time-training / ttt-tk
☆41Updated last month
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated last year
SalesforceAIResearch / GemFilter
☆85Updated 3 weeks ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆243Updated 6 months ago