triton ver of gqa flash attn, based on the tutorial
☆12Aug 4, 2024Updated last year
Alternatives and similar repositories for flash_attn_gqa
Users that are interested in flash_attn_gqa are comparing it to the libraries listed below
Sorting:
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆22Apr 22, 2025Updated 11 months ago
- ☆19Dec 24, 2024Updated last year
- An implementation of the Prism layer (https://arxiv.org/abs/2011.04823)☆12Nov 13, 2020Updated 5 years ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆51Oct 18, 2024Updated last year
- Personalized Fashion Compatibility Modeling via Metapath-guided Heterogeneous Graph Learning.☆15Nov 7, 2022Updated 3 years ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"☆14Mar 1, 2023Updated 3 years ago
- 挑战杯rk板端代码,gstreamer mpp硬解码,以及推理模型的rknn部署☆13Sep 12, 2023Updated 2 years ago
- Awesome Triton Resources☆39Apr 27, 2025Updated 10 months ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- Triton implementation of Flash Attention2.0☆51Jul 31, 2023Updated 2 years ago
- Code for NeurIPS 2022 Spotlight paper " Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation"☆20Nov 16, 2022Updated 3 years ago
- DisCo Transformer for Non-autoregressive MT☆77Jul 28, 2022Updated 3 years ago
- Code for the papers: "Graph Representation Learning for Multi-Task Settings: a Meta-Learning Approach", "A Meta-Learning Approach for Gra…☆18Apr 26, 2022Updated 3 years ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆134Mar 21, 2025Updated last year
- [AACL 2023] Official implementation of paper "Towards LLM-based Fact Verification on News Claims with a Hierarchical Step-by-Step Prompti…☆21Apr 1, 2024Updated last year
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Jun 5, 2018Updated 7 years ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Jun 5, 2024Updated last year
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆21Jul 29, 2024Updated last year
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆159Jul 8, 2025Updated 8 months ago
- ☆28Sep 28, 2021Updated 4 years ago
- Official PyTorch (Lightning) implementation of the NeurIPS 2020 paper "Efficient Marginalization of Discrete and Structured Latent Variab…☆27May 3, 2021Updated 4 years ago
- Dockerfile and instructions for human pose estimation implementation using Caffe, OpenCV 3.1.0 and Python 2.7.☆12Mar 3, 2019Updated 7 years ago
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- QAQ: Quality Adaptive Quantization for LLM KV Cache☆54Mar 27, 2024Updated last year
- This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficie…☆26Oct 27, 2022Updated 3 years ago
- Code for NeurIPS2020 "Incorporating BERT into Parallel Sequence Decoding with Adapters"☆32Oct 18, 2022Updated 3 years ago
- ☆20Oct 11, 2023Updated 2 years ago
- QWOP AI using Q-learning☆12Jul 13, 2016Updated 9 years ago
- MXNET + OpenAI Gym implementation of A3C from "Asynchronous Methods for Deep Reinforcement Learning"☆11Apr 10, 2017Updated 8 years ago
- ☆22Dec 31, 2019Updated 6 years ago
- Fold/unfold markdown section. | 折叠/展开 Markdown 章节。☆20Nov 24, 2025Updated 3 months ago
- a simple tool to translate caffe model to keras model☆10Oct 26, 2015Updated 10 years ago
- ☆84Nov 14, 2019Updated 6 years ago
- Transformers components but in Triton☆34May 9, 2025Updated 10 months ago
- Mxnet implementation of an ICLR 2018 paper: A new method of region embedding for text classification.☆10Oct 14, 2018Updated 7 years ago
- Reversal Curse Experiment☆15Sep 24, 2023Updated 2 years ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆47Apr 15, 2025Updated 11 months ago
- 山东大学青岛校区 地图APP☆10Nov 7, 2020Updated 5 years ago