triton ver of gqa flash attn, based on the tutorial
☆12Aug 4, 2024Updated last year
Alternatives and similar repositories for flash_attn_gqa
Users that are interested in flash_attn_gqa are comparing it to the libraries listed below
Sorting:
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 2 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- Personalized Fashion Compatibility Modeling via Metapath-guided Heterogeneous Graph Learning.☆15Nov 7, 2022Updated 3 years ago
- Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"☆14Mar 1, 2023Updated 3 years ago
- Awesome Triton Resources☆39Apr 27, 2025Updated 10 months ago
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆22Apr 22, 2025Updated 10 months ago
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Jun 5, 2018Updated 7 years ago
- Code for NeurIPS 2022 Spotlight paper " Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation"☆20Nov 16, 2022Updated 3 years ago
- ☆20Oct 11, 2023Updated 2 years ago
- DisCo Transformer for Non-autoregressive MT☆77Jul 28, 2022Updated 3 years ago
- ☆20Dec 24, 2024Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Jun 5, 2024Updated last year
- A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.☆25Oct 22, 2023Updated 2 years ago
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆21Jul 29, 2024Updated last year
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆51Oct 18, 2024Updated last year
- This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficie…☆26Oct 27, 2022Updated 3 years ago
- ☆23Dec 31, 2019Updated 6 years ago
- ☆29Jul 9, 2024Updated last year
- Official PyTorch (Lightning) implementation of the NeurIPS 2020 paper "Efficient Marginalization of Discrete and Structured Latent Variab…☆27May 3, 2021Updated 4 years ago
- Flash Attention in 300-500 lines of CUDA/C++☆36Aug 22, 2025Updated 6 months ago
- ☆36Feb 26, 2024Updated 2 years ago
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- ☆28Sep 28, 2021Updated 4 years ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆134Mar 21, 2025Updated 11 months ago
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- ☆36Oct 3, 2018Updated 7 years ago
- Code for NeurIPS2020 "Incorporating BERT into Parallel Sequence Decoding with Adapters"☆32Oct 18, 2022Updated 3 years ago
- Concurrency library☆17Oct 13, 2024Updated last year
- ☆11Dec 23, 2024Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- A PyTorch implementation of Recurrent Additive Networks by Lee et al. (2017)☆29Oct 17, 2017Updated 8 years ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆83Jan 14, 2025Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 8 months ago
- A higher quality RVC pretrained model to accelerate your training process.☆21Nov 11, 2025Updated 3 months ago
- ☆14May 14, 2019Updated 6 years ago
- Material parsers and other tools, scripts Initially developed for Grobid Superconductor☆13Feb 21, 2025Updated last year
- Python Inference Script(PyIS)☆19Aug 30, 2022Updated 3 years ago
- CANdle - a library for using USB-FDCAN dongle and communicating with md80 drives☆15Sep 15, 2025Updated 5 months ago