Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
☆112Jul 31, 2023Updated 2 years ago
Alternatives and similar repositories for FlashAttention20
Users that are interested in FlashAttention20 are comparing it to the libraries listed below
Sorting:
- Implementation of FlashAttention in PyTorch☆181Jan 12, 2025Updated last year
- Triton implementation of Flash Attention2.0☆50Jul 31, 2023Updated 2 years ago
- Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch☆29Jan 31, 2026Updated last month
- Some microbenchmarks and design docs before commencement☆12Feb 1, 2021Updated 5 years ago
- An simple pytorch implementation of Flash MultiHead Attention☆22Feb 5, 2024Updated 2 years ago
- Batch document loader into Quivr (https://github.com/StanGirard/quivr)☆14Jun 25, 2023Updated 2 years ago
- ☆13Apr 25, 2025Updated 10 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆169Aug 14, 2024Updated last year
- Implement FlashAttention v2 with minimal code to learn.☆15Jun 12, 2024Updated last year
- Elastic Workplace Search Official Python Client☆10Aug 8, 2024Updated last year
- Simple PyTorch profiler that combines DeepSpeed Flops Profiler and TorchInfo☆11Feb 12, 2023Updated 3 years ago
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Nov 11, 2024Updated last year
- Repository for ACL2020 paper "Refer360° A Referring Expression Recognition Dataset in 360°Images"☆13Jun 26, 2021Updated 4 years ago
- An implementation of parameter server framework in PyTorch RPC.☆12Nov 12, 2021Updated 4 years ago
- ☆15Sep 28, 2022Updated 3 years ago
- The open source implementation of "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers"☆19Mar 11, 2024Updated last year
- Multi-Modal Tree of thoughts for DALLE-3 like auto self improvement☆17Nov 11, 2024Updated last year
- “Open terminals”, “load CSVs”, “start hacking”☆16May 2, 2017Updated 8 years ago
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Nov 11, 2024Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Jan 17, 2024Updated 2 years ago
- ☆30Mar 26, 2025Updated 11 months ago
- The official implementation of the DAC 2024 paper GQA-LUT☆21Dec 20, 2024Updated last year
- ☆17Feb 19, 2024Updated 2 years ago
- Awesome Chinese Corpus Datasets and Models.☆18Oct 28, 2019Updated 6 years ago
- PyTorch implementation of Gaussian word embeddings☆19Apr 7, 2018Updated 7 years ago
- ☆23Jan 24, 2024Updated 2 years ago
- ☆42Aug 30, 2018Updated 7 years ago
- ☆38Jan 15, 2021Updated 5 years ago
- Streamlit Tutorial (ex: stock price dashboard, cartoon-stylegan, vqgan-clip, stylemixing, styleclip, sefa)☆40Feb 21, 2023Updated 3 years ago
- Fast and memory-efficient exact attention☆22,460Updated this week
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆40Nov 11, 2024Updated last year
- Codes for arXiv paper "Semi-supervised Few-shot Atomic Action Recognition".☆18Jan 2, 2021Updated 5 years ago
- 语雀 Yuque python SDK & Command line interface☆17Sep 11, 2019Updated 6 years ago
- Simple Model Similarities Analysis☆21Feb 3, 2024Updated 2 years ago
- Implementation of Proximal Policy Optimization in Jax+Flax☆21May 18, 2023Updated 2 years ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆1,085Dec 30, 2024Updated last year
- Code for ICCV2021: Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection☆28Oct 12, 2021Updated 4 years ago
- just a little project for fast face swapping using one picture☆22Jun 9, 2023Updated 2 years ago
- PyTorch Implementation of NeurIPS 2020 paper "Learning Sparse Prototypes for Text Generation"☆22Jul 8, 2021Updated 4 years ago