Adamdad / rational_kat_cuLinks

☆75

Alternatives and similar repositories for rational_kat_cu

Users that are interested in rational_kat_cu are comparing it to the libraries listed below

Sorting:

AmeenAli / HiddenMambaAttn
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
☆228Updated last week
yu-rp / KANbeFair
A More Fair and Comprehensive Comparison between KAN and MLP
☆175Updated last year
pengzhangzhi / Awesome-Mamba
Awesome list of papers that extend Mamba to various applications.
☆138Updated 4 months ago
badripatro / mamba360
State Space Models
☆70Updated last year
Hprairie / Bi-Mamba2
A Triton Kernel for incorporating Bi-Directionality in Mamba2
☆75Updated 10 months ago
kyegomez / Griffin
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆56Updated this week
MambaMixer / M2
☆47Updated last year
WailordHe / DenseSSM
A repository for DenseSSMs
☆89Updated last year
goombalab / hydra
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆161Updated 8 months ago
akaashdash / kansformers
☆137Updated last year
NVlabs / ConvSSM
☆68Updated last year
Itamarzimm / UnifiedImplicitAttnRepr
[ICLR 2025] Official Code Release for Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation
☆47Updated 7 months ago
vulus98 / Rethinking-attention
My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing o…
☆44Updated 10 months ago
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆112Updated this week
MzeroMiko / mamba-mini
An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…
☆95Updated last week
assafbk / DeciMamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)
☆31Updated 6 months ago
qhfan / RALA
[CVPR2025] Breaking the Low-Rank Dilemma of Linear Attention
☆32Updated 7 months ago
Chaos96 / fourierft
☆147Updated last year
kyleliang919 / C-Optim
When it comes to optimizers, it's always better to be safe than sorry
☆375Updated last month
OliverRensu / ARM
[ICLR2025] This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision
☆86Updated 4 months ago
Caiyun-AI / MUDDFormer
☆85Updated 5 months ago
kabachuha / kan-diffusion
☆42Updated last year
LeapLabTHU / InLine
Official repository of InLine attention (NeurIPS 2024)
☆56Updated 10 months ago
TianjinYellow / SPAM-Optimizer
☆34Updated 7 months ago
kyegomez / Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
☆192Updated 3 weeks ago
krafton-ai / mambaformer-icl
MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248
☆57Updated last year
Weixin-Liang / Mixture-of-Mamba
☆50Updated 8 months ago
bwconrad / soft-moe
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆64Updated 2 years ago
nanowell / Differential-Transformer-PyTorch
PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model …
☆77Updated 11 months ago
kyegomez / ViTAR
Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch
☆38Updated 11 months ago