lucidrains / JEPA-pytorch

☆114

Related projects: ⓘ

lucidrains / PaLM-jax
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)
☆184Updated 2 years ago
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆202Updated 3 weeks ago
lucidrains / mixture-of-attention
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆101Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆86Updated 3 months ago
lucidrains / ponder-transformer
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper
☆78Updated 2 years ago
lucidrains / HTM-pytorch
Implementation of Hierarchical Transformer Memory (HTM) for Pytorch
☆73Updated 3 years ago
LumenPallidium / jepa
Experiments in Joint Embedding Predictive Architectures (JEPAs).
☆32Updated 8 months ago
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆203Updated last year
lucidrains / taylor-series-linear-attention
Explorations into the recently proposed Taylor Series Linear Attention
☆85Updated last month
lucidrains / gated-state-spaces-pytorch
Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch
☆94Updated last year
lucidrains / ReST-EM-pytorch
☆42Updated this week
lucidrains / panoptic-transformer
Another attempt at a long-context / efficient transformer by me
☆37Updated 2 years ago
lucidrains / attention
☆192Updated this week
lucidrains / self-reasoning-tokens-pytorch
Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto
☆53Updated 4 months ago
lucidrains / discrete-key-value-bottleneck-pytorch
Implementation of Discrete Key / Value Bottleneck, in Pytorch
☆87Updated last year
lucidrains / hourglass-transformer-pytorch
Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI
☆74Updated 2 years ago
lucidrains / mirasol-pytorch
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
☆87Updated 8 months ago
lucidrains / frame-averaging-pytorch
Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network
☆45Updated last month
lucidrains / einops-exts
Implementation of some personal helper functions for Einops, my most favorite tensor manipulation library ❤️
☆52Updated last year
idiap / sigma-gpt
σ-GPT: A New Approach to Autoregressive Models
☆53Updated last month
lucidrains / block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
☆211Updated 3 weeks ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆84Updated 4 months ago
jxiw / BiGS
Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …
☆113Updated 6 months ago
lucidrains / autoregressive-linear-attention-cuda
CUDA implementation of autoregressive linear attention, with all the latest research findings
☆43Updated last year
lucidrains / kalman-filtering-attention
Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"
☆56Updated 10 months ago
lucidrains / coordinate-descent-attention
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
☆46Updated last year
lucidrains / hyena-pytorch
☆23Updated this week
andyehrenberg / flaxlm
☆27Updated this week
jxbz / agd
Automatic gradient descent
☆206Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆82Updated 3 weeks ago