knotgrass / attentionLinks

several types of attention modules written in PyTorch for learning purposes

☆52

Alternatives and similar repositories for attention

Users that are interested in attention are comparing it to the libraries listed below

Sorting:

Montinger / Transformer-Workbench
Playground for Transformers
☆53Updated last year
fkodom / grouped-query-attention-pytorch
(Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …
☆182Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 10 months ago
fkodom / yet-another-retnet
A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…
☆106Updated last year
bobby-he / simplified_transformers
☆293Updated 11 months ago
kyegomez / SparseAttention
Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"
☆92Updated last month
YeonwooSung / Pytorch_mixture-of-experts
PyTorch implementation of moe, which stands for mixture of experts
☆51Updated 4 years ago
ambisinister / mla-experiments
Experiments on Multi-Head Latent Attention
☆98Updated last year
rasbt / cvpr2023
☆134Updated 2 years ago
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆115Updated last month
fkodom / soft-mixture-of-experts
PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)
☆78Updated 2 years ago
kyegomez / Griffin
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆56Updated 3 weeks ago
nyunAI / Faster-LLM-Survey
☆42Updated last year
kyegomez / Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
☆195Updated 3 weeks ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆101Updated last year
jlamprou / Infini-Attention
Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…
☆83Updated last year
WailordHe / DenseSSM
A repository for DenseSSMs
☆89Updated last year
KindXiaoming / physics_of_skill_learning
We study toy models of skill learning.
☆31Updated 10 months ago
CG80499 / KAN-GPT-2
Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆121Updated last year
lucidrains / agent-attention-pytorch
Implementation of Agent Attention in Pytorch
☆92Updated last year
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆110Updated this week
kyegomez / SimpleMamba
Implementation of a modular, high-performance, and simplistic mamba for high-speed applications
☆37Updated last year
Arenaa / Accelerated-Generation-Techniques
This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).
☆11Updated last year
kyegomez / FlashAttention20
Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
☆111Updated 2 years ago
bzhangGo / rmsnorm
Root Mean Square Layer Normalization
☆256Updated 2 years ago
alenic / timm-models-explorer
Timm model explorer
☆42Updated last year
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆119Updated last year
IST-DASLab / RoSA
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆44Updated last year
Adamdad / rational_kat_cu
☆76Updated 9 months ago
kyegomez / MambaTransformer
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
☆210Updated last month