idoh / fast_mamba.np

A pure and fast NumPy implementation of Mamba with cache support.

☆17

Related projects: ⓘ

schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆118Updated 7 months ago
fairydreaming / farel-bench
Testing LLM reasoning abilities with family relationship quizzes.
☆40Updated 3 weeks ago
okarthikb / state-space-models
☆27Updated 2 months ago
jadechip / nanoXLSTM
The simplest, fastest repository for training/finetuning medium-sized xLSTMs.
☆38Updated 3 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆94Updated 2 weeks ago
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆27Updated last month
SmerkyG / RWKV_Explained
RWKV, in easy to read code
☆52Updated 6 months ago
SmerkyG / gptcore
Fast modular code to create and train cutting edge LLMs
☆63Updated 4 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆70Updated last month
NousResearch / StripedHyenaTrainer
☆55Updated 9 months ago
raphaelsty / LeNLP
NLP with Rust for Python 🦀🐍
☆57Updated 3 months ago
expz / quiet-star
Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)
☆27Updated last month
dvruette / barrel-rec-pytorch
☆53Updated 8 months ago
serp-ai / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
☆31Updated 3 months ago
Aleph-Alpha / trigrams
☆29Updated 3 weeks ago
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆57Updated 4 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆46Updated 5 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆82Updated 3 weeks ago
SkunkworksAI / CodeFusion
☆15Updated 10 months ago
euclaise / supertrainer2000
☆48Updated 6 months ago
cognitivecomputations / spectrum
☆75Updated 3 weeks ago
wdlctc / mini-s
☆26Updated this week
johnma2006 / candle
Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.
☆47Updated 5 months ago
OSU-NLP-Group / GrokkedTransformer
Code for the paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆140Updated 3 months ago
mag- / gpu_benchmark
Gpu benchmark
☆35Updated 2 weeks ago
joey00072 / ohara
Collection of autoregressive model implementation
☆62Updated 2 weeks ago
LAION-AI / AIW
Alice in Wonderland code base for experiments and raw experiments data
☆96Updated last week
cognitivecomputations / grokadamw
☆109Updated last month
AblateIt / finetune-study
Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.
☆81Updated last year
flawedmatrix / mamba-ssm
Implementation of mamba with rust
☆69Updated 6 months ago