kyegomez / FlashAttention20View external linksLinks
Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
☆112Jul 31, 2023Updated 2 years ago
Alternatives and similar repositories for FlashAttention20
Users that are interested in FlashAttention20 are comparing it to the libraries listed below
Sorting:
- Implementation of FlashAttention in PyTorch☆180Jan 12, 2025Updated last year
- Triton implementation of Flash Attention2.0☆50Jul 31, 2023Updated 2 years ago
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆29Jan 31, 2026Updated 2 weeks ago
- Some microbenchmarks and design docs before commencement☆12Feb 1, 2021Updated 5 years ago
- An simple pytorch implementation of Flash MultiHead Attention☆21Feb 5, 2024Updated 2 years ago
- ☆13Apr 25, 2025Updated 9 months ago
- Batch document loader into Quivr (https://github.com/StanGirard/quivr)☆14Jun 25, 2023Updated 2 years ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 6 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆167Aug 14, 2024Updated last year
- Implement FlashAttention v2 with minimal code to learn.☆14Jun 12, 2024Updated last year
- Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo☆11Feb 12, 2023Updated 3 years ago
- Elastic Workplace Search Official Python Client☆10Aug 8, 2024Updated last year
- Deep Learning for Video Retrieval by Natural Language☆11Oct 20, 2019Updated 6 years ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Nov 11, 2024Updated last year
- An implementation of parameter server framework in PyTorch RPC.☆12Nov 12, 2021Updated 4 years ago
- ☆14Aug 29, 2023Updated 2 years ago
- ☆35Oct 21, 2023Updated 2 years ago
- ☆15Sep 28, 2022Updated 3 years ago
- The open source implementation of "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers"☆19Mar 11, 2024Updated last year
- Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement☆17Nov 11, 2024Updated last year
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Nov 11, 2024Updated last year
- “Open terminals”, “load CSVs”, “start hacking”☆16May 2, 2017Updated 8 years ago
- ☆17Feb 19, 2024Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Jan 17, 2024Updated 2 years ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆46Jun 11, 2025Updated 8 months ago
- Awesome Chinese Corpus Datasets and Models.☆18Oct 28, 2019Updated 6 years ago
- ☆23Jan 24, 2024Updated 2 years ago
- ☆39Dec 14, 2025Updated 2 months ago
- ☆17Dec 9, 2022Updated 3 years ago
- PyTorch implementation of Gaussian word embeddings☆19Apr 7, 2018Updated 7 years ago
- Generate High Quality textual or multi-modal datasets with Agents☆18Jun 7, 2023Updated 2 years ago
- ☆38Jan 15, 2021Updated 5 years ago
- Fast and memory-efficient exact attention☆22,231Updated this week
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆40Nov 11, 2024Updated last year
- Flash Attention in ~100 lines of CUDA (forward pass only)☆1,068Dec 30, 2024Updated last year
- Simple Model Similarities Analysis☆21Feb 3, 2024Updated 2 years ago
- 语雀 Yuque python SDK & Command line interface☆17Sep 11, 2019Updated 6 years ago
- Codes for arXiv paper "Semi-supervised Few-shot Atomic Action Recognition".☆18Jan 2, 2021Updated 5 years ago
- Implementation of Proximal Policy Optimization in Jax+Flax☆21May 18, 2023Updated 2 years ago