Oxen-AI / mamba-diveLinks

This is the code that went into our practical dive using mamba as information extraction

☆57

Alternatives and similar repositories for mamba-dive

Users that are interested in mamba-dive are comparing it to the libraries listed below

Sorting:

tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆60Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
johnma2006 / candle
Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.
☆53Updated last year
epfml / DenseFormer
☆82Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 10 months ago
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆105Updated 3 months ago
geronimi73 / phi2-finetune
☆86Updated last year
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated 5 months ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆52Updated 2 years ago
euclaise / supertrainer2000
☆50Updated last year
CG80499 / KAN-GPT-2
Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆121Updated last year
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆72Updated last year
catid / spectral_ssm
Implementation of Spectral State Space Models
☆16Updated last year
Pleias / Various-Finetuning
Set of scripts to finetune LLMs
☆38Updated last year
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆45Updated last year
geronimi73 / mamba
☆31Updated last year
KaiNylund / lm-weights-encode-time
☆69Updated last year
Zyphra / Zyda_processing
☆39Updated last year
LegallyCoder / mamba-hf
Implementation of the Mamba SSM with hf_integration.
☆56Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆90Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
kyegomez / MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
☆125Updated 3 weeks ago
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆66Updated last year
alxndrTL / othello_mamba
Evaluating the Mamba architecture on the Othello game
☆48Updated last year
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆62Updated 2 years ago
Zyphra / BlackMamba
Code repository for Black Mamba
☆259Updated last year
lucidrains / CALM-pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
☆177Updated last year
official-elinas / zeus-llm-trainer
Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models
☆69Updated 2 years ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated 2 weeks ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year