pabloiyu / mini-language-modelLinks
Implementing Mamba SSM into a mini language model and training it on the open domain works of Sherlock Holmes. Also, implementation of parallel adapters into a transformer. Finally, code to run a quantized version of Mistral-7B.
☆9Updated last year
Alternatives and similar repositories for mini-language-model
Users that are interested in mini-language-model are comparing it to the libraries listed below
Sorting:
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆36Updated 8 months ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated 11 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆56Updated last year
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆120Updated 2 weeks ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Updated 8 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆31Updated last week
- A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series…☆28Updated 8 months ago
- ☆31Updated last year
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆109Updated last week
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21Updated 2 weeks ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆25Updated 2 weeks ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆16Updated last year
- ☆38Updated 3 months ago
- PyTorch code of "Training a Vision Transformer from scratch in less than 24 hours with 1 GPU" (HiTY workshop at Neurips 2022)☆23Updated last year
- some mixture of experts architecture implementations☆14Updated last year
- RWKV-7: Surpassing GPT☆94Updated 8 months ago
- Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth☆142Updated last week
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆38Updated 4 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆19Updated 2 years ago
- Lottery Ticket Adaptation☆39Updated 8 months ago
- Here we collect trick questions and failed tasks for open source LLMs to improve them.☆32Updated 2 years ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆200Updated 2 weeks ago
- An open source implementation of R1☆28Updated last week
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Updated 11 months ago
- Trying to deconstruct RWKV in understandable terms☆14Updated 2 years ago
- OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space mod…☆14Updated 2 weeks ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆98Updated 10 months ago
- RWKV6 in native pytorch and triton:)☆11Updated last year
- A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size☆64Updated 4 months ago