RWKV / RWKV-block
PyTorch implementation of RWKV blocks
☆28Updated last month
Alternatives and similar repositories for RWKV-block:
Users that are interested in RWKV-block are comparing it to the libraries listed below
- GoldFinch and other hybrid transformer components☆45Updated 9 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated 2 months ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆82Updated 2 months ago
- RWKV-7: Surpassing GPT☆83Updated 5 months ago
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆60Updated 3 years ago
- ☆34Updated last month
- research impl of Native Sparse Attention (2502.11089)☆53Updated 2 months ago
- ☆49Updated last year
- Implementation of the Mamba SSM with hf_integration.☆56Updated 7 months ago
- ☆27Updated last year
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆50Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆123Updated 8 months ago
- ☆94Updated 3 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆63Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆98Updated 4 months ago
- ☆53Updated last month
- ☆19Updated 3 weeks ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆48Updated last month
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆40Updated 2 weeks ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆55Updated 11 months ago
- ☆14Updated 5 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Updated last week
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated last year
- Focused on fast experimentation and simplicity☆71Updated 4 months ago
- Official PyTorch Implementation for Paper "No More Adam: Learning Rate Scaling at Initialization is All You Need"☆51Updated 2 months ago
- ☆79Updated last year
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 6 months ago
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆33Updated 10 months ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 11 months ago