Oxen-AI / mamba-dive
This is the code that went into our practical dive using mamba as information extraction
☆50Updated 8 months ago
Related projects: ⓘ
- Collection of autoregressive model implementation☆62Updated 2 weeks ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆46Updated 5 months ago
- Small and Efficient Mathematical Reasoning LLMs☆69Updated 7 months ago
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆47Updated 5 months ago
- Understand and test language model architectures on synthetic tasks.☆156Updated 4 months ago
- Set of scripts to finetune LLMs☆36Updated 5 months ago
- Evaluating the Mamba architecture on the Othello game☆41Updated 4 months ago
- ☆85Updated 7 months ago
- ☆75Updated 3 weeks ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆82Updated 3 weeks ago
- ☆73Updated 5 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆86Updated 3 months ago
- ☆50Updated last month
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆57Updated 4 months ago
- Prune transformer layers☆60Updated 3 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆84Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆38Updated 3 months ago
- ☆26Updated this week
- Implementation of the Mamba SSM with hf_integration.☆55Updated 2 weeks ago
- ☆48Updated 6 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆94Updated 2 weeks ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆153Updated last week
- Repository for the paper Stream of Search: Learning to Search in Language☆70Updated last month
- ☆68Updated 2 months ago
- ☆105Updated this week
- Multipack distributed sampler for fast padding-free training of LLMs☆170Updated last month
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆105Updated 3 months ago
- Code repository for Black Mamba☆218Updated 7 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆105Updated 3 weeks ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆120Updated last week