Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆158Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for Zamba2
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆171Updated 3 weeks ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆171Updated 3 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 3 weeks ago
- Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆169Updated this week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆83Updated last week
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆212Updated 2 months ago
- Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆183Updated this week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆105Updated last week
- ☆116Updated 2 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆202Updated last week
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆111Updated 2 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆91Updated last month
- ☆175Updated this week
- Code repository for Black Mamba☆232Updated 9 months ago
- ☆182Updated 3 weeks ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆133Updated last month
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆194Updated 6 months ago
- Token Omission Via Attention☆119Updated 3 weeks ago
- This is the official repository for Inheritune.☆105Updated last month
- ☆61Updated 2 months ago
- Understand and test language model architectures on synthetic tasks.☆161Updated 6 months ago
- ☆44Updated 2 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆119Updated 2 weeks ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆49Updated 7 months ago
- code for training & evaluating Contextual Document Embedding models☆92Updated this week
- smolLM with Entropix sampler on pytorch☆137Updated last week
- Fast parallel LLM inference for MLX☆146Updated 4 months ago
- ☆91Updated last month
- Some preliminary explorations of Mamba's context scaling.☆190Updated 9 months ago