kyegomez / MGQALinks

The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints"

☆16

Alternatives and similar repositories for MGQA

Users that are interested in MGQA are comparing it to the libraries listed below

Sorting:

kyegomez / Paper-Implementation-Template
A simple reproducible template to implement AI research papers
☆24Updated 10 months ago
WailordHe / DenseSSM
A repository for DenseSSMs
☆87Updated last year
OscarXZQ / weight-selection
☆181Updated 9 months ago
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆99Updated last week
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆107Updated 3 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 9 months ago
lucidrains / soft-moe-pytorch
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
☆304Updated 3 months ago
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆75Updated last year
kyegomez / MC-ViT
Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"
☆20Updated 3 months ago
LiqunMa / FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
☆49Updated last year
kyegomez / TTL
Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"
☆25Updated 2 weeks ago
sramshetty / mixture-of-depths
An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆35Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆111Updated 6 months ago
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆34Updated last year
UCDvision / NOLA
Code for NOLA, an implementation of "nola: Compressing LoRA using Linear Combination of Random Basis"
☆56Updated 10 months ago
fangyuan-ksgk / Mini-LLaVA
A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.
☆94Updated 7 months ago
swaggy-TN / EfficientVLM
EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning (ACL 2023)
☆30Updated 2 years ago
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆48Updated 2 years ago
bwconrad / soft-moe
PyTorch implementation of "From Sparse to Soft Mixtures of Experts"
☆58Updated last year
snu-mllab / LayerMerge
Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML 2024)
☆29Updated 11 months ago
SHI-Labs / OLA-VLM
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆60Updated 4 months ago
berlino / gated_linear_attention
☆105Updated last year
ByungKwanLee / Phantom
[Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…
☆60Updated 9 months ago
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆31Updated last year
lucidrains / sinkhorn-router-pytorch
Self contained pytorch implementation of a sinkhorn based router, for mixture of experts or otherwise
☆36Updated 10 months ago
tobna / TaylorShift
This repository contains the code for the paper "TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back)…
☆10Updated 4 months ago
kirill-vish / Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
☆101Updated 10 months ago
kyegomez / Mirasol
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆26Updated 5 months ago
pengzhangzhi / Awesome-Mamba
Awesome list of papers that extend Mamba to various applications.
☆134Updated last month
YeonwooSung / Pytorch_mixture-of-experts
PyTorch implementation of moe, which stands for mixture of experts
☆45Updated 4 years ago