jdeschena / sdttLinks

[ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models

☆29

Alternatives and similar repositories for sdtt

Users that are interested in sdtt are comparing it to the libraries listed below

Sorting:

kotoba-tech / kotomamba
Mamba training library developed by kotoba technologies
☆71Updated last year
SakanaAI / TAID
Official implementation of "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models"
☆113Updated 6 months ago
iwiwi / epochraft
Checkpointable dataset utilities for foundation model training
☆32Updated last year
SakanaAI / CycleQD
CycleQD is a framework for parameter space model merging.
☆42Updated 6 months ago
SakanaAI / Sudoku-Bench
An AI benchmark for creative, human-like problem solving using Sudoku variants
☆84Updated last week
facebookresearch / MemoryMosaics
Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.
☆47Updated 6 months ago
apapiu / mamba_small_bench
Trying out the Mamba architecture on small examples (cifar-10, shakespeare char level etc.)
☆48Updated last year
luchris429 / DiscoPOP
Code for Discovering Preference Optimization Algorithms with and for Large Language Models
☆63Updated last year
iwiwi / epochraft-hf-fsdp
Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP
☆11Updated last year
zhixuan-lin / forgetting-transformer
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
☆118Updated last month
ChenWu98 / algorithmic-creativity
[ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
☆56Updated 2 months ago
dvruette / gidd
Code accompanying the paper "Generalized Interpolating Discrete Diffusion"
☆97Updated 2 months ago
borjanG / 2023-transformers-rotf
Codes for the paper "A mathematical perspective on Transformers".
☆37Updated last year
igul222 / plaid
☆104Updated 2 years ago
KindXiaoming / grow-crystals
Getting crystal-like representations with harmonic loss
☆192Updated 4 months ago
HKUNLP / DiffuLLaMA
[ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models
☆259Updated 2 months ago
foundation-model-stack / bamba
Train, tune, and infer Bamba model
☆130Updated 2 months ago
kuleshov-group / remdm
Remasking Discrete Diffusion Models with Inference-Time Scaling
☆36Updated 5 months ago
lighttransport / japanese-llama-experiment
Japanese LLaMa experiment
☆53Updated 8 months ago
fal-ai-community / alphabet-dataset
Synthetic Alphabet Dataset
☆19Updated 4 months ago
google-deepmind / md4
Official Jax Implementation of MD4 Masked Diffusion Models
☆118Updated 5 months ago
LIONS-EPFL / scion
☆33Updated 3 weeks ago
Ino-Ichan / GIT-LLM
☆22Updated last year
kyegomez / EvoVLM-JP
Plug in & Play Pytorch Implementation of the paper: "Evolutionary Optimization of Model Merging Recipes" by Sakana AI
☆30Updated 8 months ago
RobertCsordas / moeut
☆83Updated 11 months ago
zaydzuhri / softpick-attention
Implementations of attention with the softpick function, naive and FlashAttention-2
☆81Updated 3 months ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆88Updated last year
jimmyxu123 / SELECT
This is the repository for "SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Recognition"
☆16Updated 10 months ago
zaydzuhri / flame
Fork of Flame repo for training of some new stuff in development
☆14Updated 3 weeks ago
okoge-kaz / llm-recipes
Ongoing Research Project for continaual pre-training LLM(dense mode)
☆42Updated 5 months ago