alexiglad / EBTLinks
PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning
☆569Updated last month
Alternatives and similar repositories for EBT
Users that are interested in EBT are comparing it to the libraries listed below
Sorting:
- This repo contains the code for the paper "Intuitive physics understanding emerges fromself-supervised pretraining on natural videos"☆209Updated 10 months ago
- ☆793Updated 3 weeks ago
- RLP: Reinforcement as a Pretraining Objective☆220Updated 2 months ago
- H-Net: Hierarchical Network with Dynamic Chunking☆798Updated last month
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)☆527Updated 3 months ago
- A Reproduction of GDM's Nested Learning Paper☆524Updated last month
- ☆211Updated 4 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆294Updated 7 months ago
- ☆163Updated 4 months ago
- [ICLR'25] Artificial Kuramoto Oscillatory Neurons☆106Updated 2 months ago
- [ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆580Updated 10 months ago
- ☆650Updated 8 months ago
- ☆303Updated 8 months ago
- dLLM: Simple Diffusion Language Modeling☆1,526Updated last week
- ☆152Updated 3 months ago
- ☆82Updated last year
- Normalized Transformer (nGPT)☆195Updated last year
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆279Updated last month
- Jax Codebase for Evolutionary Strategies at the Hyperscale☆205Updated last week
- Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.☆133Updated 3 months ago
- Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆441Updated 2 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆345Updated last year
- Official JAX implementation of End-to-End Test-Time Training for Long Context☆102Updated this week
- ⏰ AI conference deadline countdowns☆304Updated this week
- Large multi-modal models (L3M) pre-training.☆223Updated 3 months ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆857Updated this week
- Implementation of Diffusion Transformer (DiT) in JAX☆300Updated last year
- ☆205Updated last year
- Library for text-to-text regression, applicable to any input string representation and allows pretraining and fine-tuning over multiple r…☆304Updated 2 weeks ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆341Updated last month