fkodom / grouped-query-attention-pytorchView external linksLinks
(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints" (https://arxiv.org/pdf/2305.13245.pdf)
☆189May 9, 2024Updated last year
Alternatives and similar repositories for grouped-query-attention-pytorch
Users that are interested in grouped-query-attention-pytorch are comparing it to the libraries listed below
Sorting:
- The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…☆15Dec 11, 2023Updated 2 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- Explore how to get a VQ-VAE models efficiently!☆67Jul 24, 2025Updated 6 months ago
- A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…☆106Nov 24, 2023Updated 2 years ago
- ☆20May 30, 2024Updated last year
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆58Apr 20, 2024Updated last year
- ☆20Oct 25, 2022Updated 3 years ago
- An Experiment on Dynamic NTK Scaling RoPE☆64Nov 26, 2023Updated 2 years ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆92Jun 18, 2024Updated last year
- ☆24Sep 25, 2024Updated last year
- Implementation of Diffusion Transformers and Rectified Flow in Jax☆27Jul 9, 2024Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- ☆13Mar 16, 2025Updated 10 months ago
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …☆14Aug 25, 2023Updated 2 years ago
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- PyTorch implementation for PaLM: A Hybrid Parser and Language Model.☆10Jan 7, 2020Updated 6 years ago
- Advanced Formal Language Theory (263-5352-00L; Frühjahr 2023)☆10Feb 21, 2023Updated 2 years ago
- ☆13Jan 22, 2025Updated last year
- [CVPR 2022] Code for the paper "Quantization-aware Deep Optics for Diffractive Snapshot Hyperspectral Imaging".☆16Oct 6, 2022Updated 3 years ago
- Codebase for the paper "Schema-guided User Satisfaction Modeling for Task-oriented Dialogues"☆11Aug 6, 2025Updated 6 months ago
- ☆19Jul 21, 2025Updated 6 months ago
- ☆15May 11, 2025Updated 9 months ago
- [ECCV 2024] "REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models"☆13Aug 6, 2024Updated last year
- Here are the codes for the "3DUNetGSFormer: A deep learning pipeline for complex wetland mapping using generative adversarial networks an…☆10Nov 22, 2022Updated 3 years ago
- [NeurIPS 2024] Official Implementation of "SDformer: Similarity-driven Discrete Transformer For Time Series Generation"☆13May 23, 2025Updated 8 months ago
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆25Nov 6, 2023Updated 2 years ago
- Zero-shot evaluation on LEXGLUE tasks with GTP3.5☆29Mar 11, 2023Updated 2 years ago
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆31May 7, 2024Updated last year
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73May 26, 2024Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)☆57Sep 27, 2024Updated last year
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Jun 16, 2023Updated 2 years ago
- c++ implementation of a simple-virtual-machine☆14Sep 19, 2014Updated 11 years ago
- Extremely simple MoE implementation, mostly based off Switch Transformer☆13Feb 26, 2024Updated last year
- AI Demo 项目,一个专门为希望学习和探索人工智能(AI)技术的开发者准备的实战案例集合。☆25Jan 3, 2026Updated last month
- [TPAMI 2024] The official repo for "Stereo Image Restoration via Attention-Guided Correspondence Learning"☆10Apr 21, 2024Updated last year
- TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment☆10Mar 1, 2025Updated 11 months ago
- A PyTorch implementation of Multimodal Few-Shot Learning with Frozen Language Models with OPT.☆44Jul 23, 2022Updated 3 years ago