SmerkyG / RWKV_Explained
RWKV, in easy to read code
☆55Updated this week
Related projects ⓘ
Alternatives and complementary repositories for RWKV_Explained
- Fast modular code to create and train cutting edge LLMs☆65Updated 6 months ago
- RWKV in nanoGPT style☆177Updated 5 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆85Updated 6 months ago
- RWKV infctx trainer, for training arbitary context sizes, to 10k and beyond!☆133Updated 3 months ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆30Updated 3 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆59Updated 6 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆92Updated last month
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆48Updated 7 months ago
- Token Omission Via Attention☆120Updated last month
- Evaluating the Mamba architecture on the Othello game☆43Updated 6 months ago
- RWKV-7: Surpassing GPT☆44Updated this week
- Evaluating LLMs with Dynamic Data☆71Updated last week
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆214Updated 3 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆177Updated last month
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated last week
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆112Updated 2 months ago
- Here we will test various linear attention designs.☆56Updated 6 months ago
- This is the official repository for Inheritune.☆105Updated last month
- PB-LLM: Partially Binarized Large Language Models☆148Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last month
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆173Updated 4 months ago
- ☆62Updated 3 months ago
- Collection of autoregressive model implementation☆67Updated this week
- ☆49Updated 8 months ago
- ☆35Updated 3 weeks ago
- GoldFinch and other hybrid transformer components☆39Updated 4 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆112Updated last year
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆126Updated 7 months ago