jdeschena / sdttLinks
[ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models
☆29Updated 4 months ago
Alternatives and similar repositories for sdtt
Users that are interested in sdtt are comparing it to the libraries listed below
Sorting:
- Mamba training library developed by kotoba technologies☆71Updated last year
- Official implementation of "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models"☆113Updated 6 months ago
- Checkpointable dataset utilities for foundation model training☆32Updated last year
- CycleQD is a framework for parameter space model merging.☆42Updated 6 months ago
- An AI benchmark for creative, human-like problem solving using Sudoku variants☆84Updated last week
- Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.☆47Updated 6 months ago
- Trying out the Mamba architecture on small examples (cifar-10, shakespeare char level etc.)☆48Updated last year
- Code for Discovering Preference Optimization Algorithms with and for Large Language Models☆63Updated last year
- Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP☆11Updated last year
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆118Updated last month
- [ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction☆56Updated 2 months ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆97Updated 2 months ago
- Codes for the paper "A mathematical perspective on Transformers".☆37Updated last year
- ☆104Updated 2 years ago
- Getting crystal-like representations with harmonic loss☆192Updated 4 months ago
- [ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆259Updated 2 months ago
- Train, tune, and infer Bamba model☆130Updated 2 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆36Updated 5 months ago
- Japanese LLaMa experiment☆53Updated 8 months ago
- Synthetic Alphabet Dataset☆19Updated 4 months ago
- Official Jax Implementation of MD4 Masked Diffusion Models☆118Updated 5 months ago
- ☆33Updated 3 weeks ago
- ☆22Updated last year
- Plug in & Play Pytorch Implementation of the paper: "Evolutionary Optimization of Model Merging Recipes" by Sakana AI☆30Updated 8 months ago
- ☆83Updated 11 months ago
- Implementations of attention with the softpick function, naive and FlashAttention-2☆81Updated 3 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆88Updated last year
- This is the repository for "SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Recognition"☆16Updated 10 months ago
- Fork of Flame repo for training of some new stuff in development☆14Updated 3 weeks ago
- Ongoing Research Project for continaual pre-training LLM(dense mode)☆42Updated 5 months ago