kyegomez / FlashMHA
An simple pytorch implementation of Flash MultiHead Attention
☆13Updated 11 months ago
Alternatives and similar repositories for FlashMHA:
Users that are interested in FlashMHA are comparing it to the libraries listed below
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆15Updated 2 months ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Updated 10 months ago
- ☆31Updated 7 months ago
- Triton implement of bi-directional (non-causal) linear attention☆35Updated last week
- ☆45Updated last year
- Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆76Updated last month
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆46Updated 6 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆42Updated 6 months ago
- A simple reproducible template to implement AI research papers☆22Updated 4 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆28Updated 7 months ago
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆79Updated last week
- Linear Attention Sequence Parallelism (LASP)☆74Updated 7 months ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated last month
- ☆37Updated 3 months ago
- ☆38Updated 11 months ago
- Explorations into improving ViTArc with Slot Attention☆37Updated 3 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆38Updated 10 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆16Updated 2 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- My personal implementation of the model from "Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities", they haven't rel…☆13Updated 11 months ago
- Implementation of Infini-Transformer in Pytorch☆107Updated 2 weeks ago
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆110Updated last month
- Lottery Ticket Adaptation☆37Updated 2 months ago
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆35Updated 7 months ago
- A minimal implementation of vllm.☆32Updated 5 months ago
- Here we collect trick questions and failed tasks for open source LLMs to improve them.☆32Updated last year
- A repository for research on medium sized language models.☆76Updated 7 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆18Updated last year