Adamdad / rational_kat_cu
☆47Updated last week
Alternatives and similar repositories for rational_kat_cu:
Users that are interested in rational_kat_cu are comparing it to the libraries listed below
- A repository for DenseSSMs☆86Updated 10 months ago
- ☆45Updated 10 months ago
- Triton implement of bi-directional (non-causal) linear attention☆42Updated last week
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆122Updated 2 weeks ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆36Updated 7 months ago
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆60Updated last month
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆35Updated 4 months ago
- My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing o…☆44Updated 2 months ago
- This repository is the official implementation of our Autoregressive Pretraining with Mamba in Vision☆68Updated 7 months ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆50Updated last year
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆58Updated 8 months ago
- Official Implementation Of The Paper: `DeciMamba: Exploring the Length Extrapolation Potential of Mamba'☆23Updated 6 months ago
- State Space Models☆64Updated 9 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆52Updated 5 months ago
- A More Fair and Comprehensive Comparison between KAN and MLP☆159Updated 5 months ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆211Updated 8 months ago
- The official Pytorch implementation of the paper "Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT …☆32Updated 11 months ago
- Awesome list of papers that extend Mamba to various applications.☆131Updated last month
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆60Updated 10 months ago
- ☆16Updated last year
- More dimensions = More fun☆21Updated 6 months ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆92Updated 7 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆79Updated 11 months ago
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆96Updated 5 months ago
- Second Generation of the MAMBA Software☆28Updated 4 months ago
- ☆26Updated 3 weeks ago
- (NeurIPS 2024) BiDM: Pushing the Limit of Quantization for Diffusion Models☆16Updated 2 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆51Updated 2 weeks ago
- (NeurIPS 2023) PyTorch implementation of "Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation"☆19Updated 4 months ago
- [ICLR 2024] Improving Convergence and Generalization Using Parameter Symmetries☆29Updated 8 months ago