lucidrains / st-moe-pytorchLinks

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

☆350

Alternatives and similar repositories for st-moe-pytorch

Users that are interested in st-moe-pytorch are comparing it to the libraries listed below

Sorting:

lucidrains / soft-moe-pytorch
Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
☆305Updated 3 months ago
lucidrains / mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
☆782Updated last year
lucidrains / recurrent-memory-transformer-pytorch
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆412Updated 6 months ago
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆532Updated 2 months ago
lucidrains / rotary-embedding-torch
Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch
☆714Updated 2 weeks ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆228Updated 10 months ago
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆216Updated last year
bobby-he / simplified_transformers
☆292Updated 7 months ago
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆215Updated 11 months ago
lucidrains / memory-efficient-attention-pytorch
Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"
☆379Updated 2 years ago
srush / annotated-mamba
Annotated version of the Mamba paper
☆487Updated last year
syncdoth / RetNet
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…
☆226Updated last year
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆537Updated last year
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆556Updated 6 months ago
booydar / LM-RMT
Recurrent Memory Transformer
☆150Updated last year
zyushun / Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆431Updated 2 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆220Updated last week
haoliuhl / ringattention
Large Context Attention
☆720Updated 6 months ago
lucidrains / block-recurrent-transformer-pytorch
Implementation of Block Recurrent Transformer - Pytorch
☆221Updated 11 months ago
mlfoundations / task_vectors
Editing Models with Task Arithmetic
☆484Updated last year
lucidrains / CALM-pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
☆177Updated 10 months ago
google-research / vmoe
☆655Updated last month
Haiyang-W / TokenFormer
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
☆565Updated 5 months ago
llm-random / llm-random
☆191Updated 2 weeks ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated last month
test-time-training / ttt-lm-jax
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆416Updated 11 months ago
lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆461Updated last week
facebookresearch / mega
Sequence modeling with Mega.
☆296Updated 2 years ago
kyegomez / Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
☆175Updated 3 months ago
OpenNLPLab / TransnormerLLM
Official implementation of TransNormerLLM: A Faster and Better LLM
☆247Updated last year