kroggen / mamba-cpu
Modified Mamba code to run on CPU
☆26Updated 8 months ago
Related projects: ⓘ
- Implementation of mamba with rust☆69Updated 6 months ago
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆103Updated last week
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆120Updated last week
- RWKV in nanoGPT style☆170Updated 3 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆153Updated last week
- Inference of Mamba models in pure C☆176Updated 6 months ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆185Updated 3 weeks ago
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆73Updated last week
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆46Updated 5 months ago
- Minimal Mamba-2 implementation in PyTorch☆89Updated 3 months ago
- PB-LLM: Partially Binarized Large Language Models☆143Updated 10 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 2 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆73Updated 3 weeks ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆258Updated 10 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆38Updated 3 months ago
- Code repository for Black Mamba☆218Updated 7 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆158Updated 2 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆68Updated 2 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆56Updated this week
- ☆169Updated this week
- Structural Pruning for LLaMA☆55Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆118Updated 2 weeks ago
- ☆190Updated last week
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆206Updated last month
- Implementation of the Mamba SSM with hf_integration.☆55Updated 2 weeks ago
- PyTorch implementation of models from the Zamba2 series.☆63Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆105Updated 3 weeks ago
- Here we will test various linear attention designs.☆55Updated 4 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆57Updated 4 months ago
- QuIP quantization☆41Updated 6 months ago