Francesco215 / autoregressive_diffusionLinks

Video Diffusion Model. Autoregressive, long context, efficient training and inference. WIP

☆29

Alternatives and similar repositories for autoregressive_diffusion

Users that are interested in autoregressive_diffusion are comparing it to the libraries listed below

Sorting:

LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆100Updated 2 weeks ago
KindXiaoming / grow-crystals
Getting crystal-like representations with harmonic loss
☆193Updated 4 months ago
apoorvkh / academic-pretraining
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
☆143Updated 2 months ago
cloneofsimo / scaling-guide
WIP
☆94Updated 11 months ago
alexiglad / EBT
PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning
☆409Updated 3 weeks ago
dome272 / Flow-Matching
My take on Flow Matching
☆70Updated 7 months ago
fal-ai-community / alphabet-dataset
Synthetic Alphabet Dataset
☆19Updated 4 months ago
zaydzuhri / softpick-attention
Implementations of attention with the softpick function, naive and FlashAttention-2
☆81Updated 3 months ago
fal-ai / diffusion-speedrun
Focused on fast experimentation and simplicity
☆76Updated 7 months ago
lucidrains / transformer-directed-evolution
Explorations into whether a transformer with RL can direct a genetic algorithm to converge faster
☆70Updated 2 months ago
apple / ml-act
☆53Updated 8 months ago
idiap / sigma-gpt
σ-GPT: A New Approach to Autoregressive Models
☆67Updated 11 months ago
Sohl-Dickstein / fractal
The boundary of neural network trainability is fractal
☆215Updated last year
facebookresearch / capi
Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"
☆115Updated 4 months ago
cloneofsimo / zeroshampoo
☆34Updated 11 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆101Updated 7 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆130Updated last year
akarshkumar0101 / fer
Code for the Fractured Entangled Representation Hypothesis position paper!
☆174Updated 2 months ago
dvruette / gidd
Code accompanying the paper "Generalized Interpolating Discrete Diffusion"
☆97Updated 2 months ago
ShadeAlsha / ICon
ICLR 2025 - official implementation for "I-Con: A Unifying Framework for Representation Learning"
☆109Updated last month
AllanYangZhou / universal_neural_functional
☆51Updated last year
Kai-46 / minFM
☆109Updated this week
KellerJordan / cifar10-airbench
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
☆275Updated 3 weeks ago
JinjieNi / dlms-are-super-data-learners
The official github repo for "Diffusion Language Models are Super Data Learners".
☆77Updated this week
evanatyourservice / kron_torch
An implementation of PSGD Kron second-order optimizer for PyTorch
☆94Updated 2 weeks ago
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆190Updated last year
jfpuget / ARC-AGI-Challenge-2024
☆56Updated 8 months ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆190Updated 8 months ago
zacharyhorvitz / Fk-Diffusion-Steering
A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.
☆174Updated last month
subhashk01 / LLM-addition
LLMs represent numbers on a helix and manipulate that helix to do addition.
☆25Updated 6 months ago