facebookresearch / QincoLinks

Residual Quantization with Implicit Neural Codebooks

☆94

Alternatives and similar repositories for Qinco

Users that are interested in Qinco are comparing it to the libraries listed below

Sorting:

zhixuan-lin / forgetting-transformer
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
☆116Updated last week
fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
☆52Updated 5 months ago
lucidrains / deep-cross-attention
Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch
☆89Updated 4 months ago
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated 10 months ago
RWKV / RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…
☆49Updated 4 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆127Updated 10 months ago
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆31Updated last year
goombalab / hydra
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆144Updated 5 months ago
RobertCsordas / moeut
☆82Updated 10 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 9 months ago
g-luo / vlm_cross_modal_reps
Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025
☆27Updated 2 months ago
qiuzh20 / RMoE
Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)
☆22Updated 11 months ago
piotrpiekos / MoSA
User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…
☆22Updated 2 months ago
goombalab / phi-mamba
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…
☆110Updated 10 months ago
lucidrains / self-reasoning-tokens-pytorch
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
☆56Updated last year
facebookresearch / Mixture-of-Transformers
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.
☆86Updated 2 months ago
NX-AI / mlstm_kernels
Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.
☆64Updated 2 weeks ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆185Updated 4 months ago
lucidrains / simplicial-attention
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆34Updated this week
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆79Updated last year
Optimization-AI / FastCLIP
Distributed Optimization Infra for learning CLIP models
☆26Updated 9 months ago
lucidrains / hyper-connections
Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public
☆87Updated last month
YuchuanTian / DiJiang
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…
☆101Updated last year
lucidrains / adam-atan2-pytorch
Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch
☆110Updated 7 months ago
lucidrains / maskbit-pytorch
Implementation of the proposed MaskBit from Bytedance AI
☆82Updated 8 months ago
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆66Updated 9 months ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆40Updated last year
HelmholtzAI-FZJ / flex_gen
☆17Updated 6 months ago
lucidrains / multimodal-dit-pytorch
Implementation of a multimodal diffusion transformer in Pytorch
☆102Updated last year
apoorvkh / torchrunx
Easily run PyTorch on multiple GPUs & machines
☆46Updated 3 weeks ago