Haiyang-W / TokenFormer
Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆211Updated this week
Related projects ⓘ
Alternatives and complementary repositories for TokenFormer
- ☆175Updated this week
- Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆169Updated last week
- ☆227Updated 2 months ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆242Updated 6 months ago
- PyTorch implementation of models from the Zamba2 series.☆158Updated this week
- Some preliminary explorations of Mamba's context scaling.☆190Updated 9 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆213Updated 2 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆240Updated this week
- Annotated version of the Mamba paper☆455Updated 8 months ago
- Understand and test language model architectures on synthetic tasks.☆161Updated 6 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆474Updated 2 weeks ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆351Updated this week
- WIP☆89Updated 2 months ago
- ☆182Updated 3 weeks ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆111Updated 2 months ago
- Helpful tools and examples for working with flex-attention☆462Updated 2 weeks ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆261Updated last year
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆171Updated 3 weeks ago
- Implementation of Infini-Transformer in Pytorch☆104Updated last month
- Code repository for Black Mamba☆232Updated 9 months ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆161Updated 4 months ago
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆108Updated last week
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆291Updated 4 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆77Updated last month
- Official code for "TOAST: Transfer Learning via Attention Steering"☆186Updated last year
- Token Omission Via Attention☆119Updated 3 weeks ago
- Implementation of the Llama architecture with RLHF + Q-learning☆156Updated 10 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 3 weeks ago