NonvolatileMemory/flash_attn_gqa

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NonvolatileMemory/flash_attn_gqa)

NonvolatileMemory / flash_attn_gqa

triton ver of gqa flash attn, based on the tutorial

☆12

Alternatives and similar repositories for flash_attn_gqa

Users that are interested in flash_attn_gqa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sail-sg / LightTrans
View on GitHub
The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"
☆22Apr 22, 2025Updated last year
NonvolatileMemory / flash_tree_attn
View on GitHub
☆20Dec 24, 2024Updated last year
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆54Oct 18, 2024Updated last year
unixpickle / torch-bandpass
View on GitHub
An implementation of the Prism layer (https://arxiv.org/abs/2011.04823)
☆12Nov 13, 2020Updated 5 years ago
SparkJiao / MG-PFCM_outfit_rec
View on GitHub
Personalized Fashion Compatibility Modeling via Metapath-guided Heterogeneous Graph Learning.
☆16Nov 7, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ictnlp / FA-DAT
View on GitHub
Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"
☆14Mar 1, 2023Updated 3 years ago
da03 / criticize_text_generation
View on GitHub
A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …
☆12Mar 18, 2023Updated 3 years ago
LostmanMing / fatigue_driving_detection
View on GitHub
挑战杯rk板端代码，gstreamer mpp硬解码，以及推理模型的rknn部署
☆13Sep 12, 2023Updated 2 years ago
kyegomez / FlashAttention20Triton
View on GitHub
Triton implementation of Flash Attention2.0
☆54Jul 31, 2023Updated 2 years ago
ictnlp / NMLA-NAT
View on GitHub
Code for NeurIPS 2022 Spotlight paper " Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation"
☆20Nov 16, 2022Updated 3 years ago
facebookresearch / DisCo
View on GitHub
DisCo Transformer for Non-autoregressive MT
☆77Jul 28, 2022Updated 3 years ago
sail-sg / CPO
View on GitHub
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆137Mar 21, 2025Updated last year
DavideBuffelli / SAME
View on GitHub
Code for the papers: "Graph Representation Learning for Multi-Task Settings: a Meta-Learning Approach", "A Meta-Learning Approach for Gra…
☆18Apr 26, 2022Updated 4 years ago
jadeCurl / HiSS
View on GitHub
[AACL 2023] Official implementation of paper "Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompti…
☆21Apr 1, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zhangjiong724 / spectral-RNN
View on GitHub
STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION
☆16Jun 5, 2018Updated 8 years ago
emalach / LinearLM
View on GitHub
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆21Jul 29, 2024Updated last year
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
Doraemonzzz / Awesome-Triton-Resources
View on GitHub
Awesome Triton Resources
☆43Apr 27, 2025Updated last year
tencent-ailab / ICML21_OAXE
View on GitHub
☆28Sep 28, 2021Updated 4 years ago
deep-spin / sparse-marginalization-lvm
View on GitHub
Official PyTorch (Lightning) implementation of the NeurIPS 2020 paper "Efficient Marginalization of Discrete and Structured Latent Variab…
☆27May 3, 2021Updated 5 years ago
justinshenk / video-pose-extractor
View on GitHub
Dockerfile and instructions for human pose estimation implementation using Caffe, OpenCV 3.1.0 and Python 2.7.
☆12Mar 3, 2019Updated 7 years ago
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
RUCAIBox / ELMER
View on GitHub
This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficie…
☆26Oct 27, 2022Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ClubieDong / QAQ-KVCacheQuantization
View on GitHub
QAQ: Quality Adaptive Quantization for LLM KV Cache
☆54Mar 27, 2024Updated 2 years ago
lemmonation / abnet
View on GitHub
Code for NeurIPS2020 "Incorporating BERT into Parallel Sequence Decoding with Adapters"
☆32Oct 18, 2022Updated 3 years ago
ptillet / triton-llvm-releases
View on GitHub
☆20Oct 11, 2023Updated 2 years ago
qema / qwop-ai
View on GitHub
QWOP AI using Q-learning
☆12Jul 13, 2016Updated 10 years ago
happywu / A3C
View on GitHub
MXNET + OpenAI Gym implementation of A3C from "Asynchronous Methods for Deep Reinforcement Learning"
☆11Apr 10, 2017Updated 9 years ago
ictnlp / BoN-NAT
View on GitHub
☆22Dec 31, 2019Updated 6 years ago
OdinLin / caffe2keras
View on GitHub
a simple tool to translate caffe model to keras model
☆10Oct 26, 2015Updated 10 years ago
zhaozhengChen / RegionEmbedding
View on GitHub
Mxnet implementation of an ICLR 2018 paper: A new method of region embedding for text classification.
☆10Oct 14, 2018Updated 7 years ago
dame-cell / Triformer
View on GitHub
Transformers components but in Triton
☆34May 9, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
clevcode / reversal-curse
View on GitHub
Reversal Curse Experiment
☆15Sep 24, 2023Updated 2 years ago
sail-sg / dice
View on GitHub
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆47Apr 15, 2025Updated last year
FreedomIntelligence / complex-order
View on GitHub
☆84Nov 14, 2019Updated 6 years ago
k-yudong / SduMap
View on GitHub
山东大学青岛校区地图APP
☆10Nov 7, 2020Updated 5 years ago
sail-sg / Attention-Sink
View on GitHub
[ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)
☆164Jul 8, 2025Updated last year
jungokasai / deep-shallow
View on GitHub
☆43Sep 16, 2020Updated 5 years ago
sail-sg / tty-use
View on GitHub
☆15Oct 13, 2025Updated 9 months ago