idiap / sigma-gptLinks

σ-GPT: A New Approach to Autoregressive Models

☆70

Alternatives and similar repositories for sigma-gpt

Users that are interested in sigma-gpt are comparing it to the libraries listed below

Sorting:

cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆168Updated 10 months ago
epfml / DenseFormer
☆82Updated last year
dvruette / barrel-rec-pytorch
☆53Updated last year
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆103Updated 11 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 9 months ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated last year
fal-ai / diffusion-speedrun
Focused on fast experimentation and simplicity
☆75Updated 11 months ago
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆224Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 7 months ago
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆195Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
apple / ml-planner
☆56Updated last year
PolymathicAI / xVal
Repository for code used in the xVal paper
☆145Updated last year
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
cloneofsimo / min-fsdp
☆91Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆105Updated 4 months ago
NousResearch / StripedHyenaTrainer
☆62Updated last year
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆294Updated 6 months ago
kklemon / FlashPerceiver
Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.
☆31Updated last year
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
lucidrains / GAF-microbatch-pytorch
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Updated 10 months ago
CERC-AAI / Robin
☆63Updated last year
apoorvkh / academic-pretraining
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
☆148Updated 2 months ago
vmicheli / delta-iris
Efficient World Models with Context-Aware Tokenization. ICML 2024
☆114Updated last year
iliao2345 / CompressARC
☆201Updated 3 months ago
lucidrains / mind-evolution
Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind
☆57Updated 6 months ago
okarthikb / state-space-models
☆28Updated last year
cloneofsimo / scaling-guide
WIP
☆93Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year