recursal / RADLADS-paperLinks
RADLADS training code
☆36Updated 8 months ago
Alternatives and similar repositories for RADLADS-paper
Users that are interested in RADLADS-paper are comparing it to the libraries listed below
Sorting:
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆53Updated 3 weeks ago
- Awesome Triton Resources☆39Updated 9 months ago
- Here we will test various linear attention designs.☆62Updated last year
- A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…☆47Updated 3 months ago
- Stick-breaking attention☆62Updated 7 months ago
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆30Updated last week
- Fast and memory-efficient exact attention☆75Updated 11 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- continous batching and parallel acceleration for RWKV6☆22Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆229Updated 7 months ago
- The evaluation framework for training-free sparse attention in LLMs☆114Updated last week
- Transformers components but in Triton☆34Updated 8 months ago
- ☆27Updated 6 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- RWKV-7: Surpassing GPT☆104Updated last year
- ☆54Updated last year
- ☆44Updated 3 months ago
- This repository contains code for the MicroAdam paper.☆22Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆92Updated 6 months ago
- ☆66Updated 10 months ago
- RWKV, in easy to read code☆72Updated 10 months ago
- ☆35Updated last year
- ☆32Updated last year
- Triton implement of bi-directional (non-causal) linear attention☆64Updated last year
- ☆57Updated last year
- QuIP quantization☆61Updated last year
- ☆63Updated 7 months ago
- Experiments on the impact of depth in transformers and SSMs.☆40Updated 3 months ago
- ☆132Updated 8 months ago