Jaykef / ai-algorithms
First-principle implementations of groundbreaking AI algorithms using a wide range of deep learning frameworks, accompanied by supporting research papers.
☆122Updated last week
Alternatives and similar repositories for ai-algorithms:
Users that are interested in ai-algorithms are comparing it to the libraries listed below
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆155Updated this week
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆182Updated this week
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- ☆243Updated 4 months ago
- Naively combining transformers and Kolmogorov-Arnold Networks to learn and experiment☆35Updated 6 months ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆113Updated 8 months ago
- Explorations into improving ViTArc with Slot Attention☆37Updated 3 months ago
- LoRA and DoRA from Scratch Implementations☆195Updated 10 months ago
- ☆122Updated 8 months ago
- Normalized Transformer (nGPT)☆146Updated 2 months ago
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793☆383Updated last month
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆79Updated this week
- my attempts at implementing various bits of Sepp Hochreiter's new xLSTM architecture☆129Updated 8 months ago
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆149Updated 8 months ago
- Build high-performance AI models with modular building blocks☆459Updated this week
- Awesome list of papers that extend Mamba to various applications.☆129Updated last month
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆65Updated last week
- The official implementation of Tensor ProducT ATTenTion Transformer (T6)☆261Updated this week
- Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)☆159Updated last year
- PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model …☆51Updated 3 months ago
- PyTorch implementation of models from the Zamba2 series.☆173Updated this week
- When it comes to optimizers, it's always better to be safe than sorry☆166Updated last week
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆60Updated last month
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆211Updated 8 months ago
- A easy, reliable, fluid template for python packages complete with docs, testing suites, readme's, github workflows, linting and much muc…☆156Updated this week
- Minimal Mamba-2 implementation in PyTorch☆166Updated 7 months ago
- Implementation of Agent Attention in Pytorch☆89Updated 6 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆51Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆288Updated last month
- My fork os allen AI's OLMo for educational purposes.☆30Updated last month