Francesco215 / autoregressive_diffusionLinks
Video Diffusion Model. Autoregressive, long context, efficient training and inference. WIP
☆34Updated 3 months ago
Alternatives and similar repositories for autoregressive_diffusion
Users that are interested in autoregressive_diffusion are comparing it to the libraries listed below
Sorting:
- σ-GPT: A New Approach to Autoregressive Models☆70Updated last year
- Focused on fast experimentation and simplicity☆76Updated 11 months ago
- ☆105Updated 4 months ago
- Flash Attention Triton kernel with support for second-order derivatives☆121Updated this week
- RS-IMLE☆43Updated last year
- WIP☆93Updated last year
- Getting crystal-like representations with harmonic loss☆193Updated 8 months ago
- Synthetic Alphabet Dataset☆19Updated 8 months ago
- DeMo: Decoupled Momentum Optimization☆197Updated last year
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆149Updated 2 months ago
- Implementation of the proposed Spline-Based Transformer from Disney Research☆105Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated 11 months ago
- Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"☆125Updated 8 months ago
- Code for the Fractured Entangled Representation Hypothesis position paper!☆216Updated last month
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction☆80Updated 6 months ago
- ☆33Updated 11 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆97Updated 4 months ago
- ☆34Updated last year
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆110Updated 6 months ago
- Don't just regulate gradients like in Muon, regulate the weights too☆31Updated 4 months ago
- Jax Codebase for Evolutionary Strategies at the Hyperscale☆188Updated last month
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆43Updated last year
- My take on Flow Matching☆86Updated 11 months ago
- PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning☆565Updated last month
- Implementation of a framework for Genie2 in Pytorch☆156Updated 11 months ago
- 📄Small Batch Size Training for Language Models☆68Updated 2 months ago
- ☆162Updated 4 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆85Updated 3 months ago
- LLMs represent numbers on a helix and manipulate that helix to do addition.☆27Updated 10 months ago
- Minimal GPT (~350 lines with a simple task to test it)☆63Updated 3 weeks ago