pabloiyu / mini-language-modelLinks

Implementing Mamba SSM into a mini language model and training it on the open domain works of Sherlock Holmes. Also, implementation of parallel adapters into a transformer. Finally, code to run a quantized version of Mistral-7B.

☆9

Alternatives and similar repositories for mini-language-model

Users that are interested in mini-language-model are comparing it to the libraries listed below

Sorting:

kyegomez / SimpleMamba
Implementation of a modular, high-performance, and simplistic mamba for high-speed applications
☆36Updated 8 months ago
LegallyCoder / mamba-hf
Implementation of the Mamba SSM with hf_integration.
☆56Updated 11 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆56Updated last year
kyegomez / MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
☆120Updated 2 weeks ago
kyegomez / TinyGPTV
Simple Implementation of TinyGPTV in super simple Zeta lego blocks
☆16Updated 8 months ago
kyegomez / OpenStrawberry
An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO
☆31Updated last week
kyegomez / Simba
A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series…
☆28Updated 8 months ago
geronimi73 / mamba
☆31Updated last year
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆109Updated last week
kyegomez / MambaFormer
Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…
☆21Updated 2 weeks ago
kyegomez / TTL
Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"
☆25Updated 2 weeks ago
kyegomez / MobileVLM
Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …
☆16Updated last year
yynil / RWKVInside
☆38Updated 3 months ago
BorealisAI / efficient-vit-training
PyTorch code of "Training a Vision Transformer from scratch in less than 24 hours with 1 GPU" (HiTY workshop at Neurips 2022)
☆23Updated last year
swiss-ai / MoE
some mixture of experts architecture implementations
☆14Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆94Updated 8 months ago
woct0rdho / transformers-qwen3-moe-fused
Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth
☆142Updated last week
menhguin / minp_paper
Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper
☆38Updated 4 months ago
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 3 months ago
changjonathanc / LoRAnanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆19Updated 2 years ago
kiddyboots216 / lottery-ticket-adaptation
Lottery Ticket Adaptation
☆39Updated 8 months ago
BlinkDL / LM-Trick-Questions
Here we collect trick questions and failed tasks for open source LLMs to improve them.
☆32Updated 2 years ago
kyegomez / MambaTransformer
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
☆200Updated 2 weeks ago
kyegomez / OpenR1
An open source implementation of R1
☆28Updated last week
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated 11 months ago
cwhy / rwkv-decon
Trying to deconstruct RWKV in understandable terms
☆14Updated 2 years ago
Agora-Lab-AI / OmegaViT
OmegaViT (ΩViT) is a cutting-edge vision transformer architecture that combines multi-query attention, rotary embeddings, state space mod…
☆14Updated 2 weeks ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 10 months ago
zhiyuan1i / TorchRWKV
RWKV6 in native pytorch and triton:)
☆11Updated last year
gabrielolympie / moe-pruner
A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆64Updated 4 months ago