ml-gde / jflux
JAX Implementation of Black Forest Labs' Flux.1 family of models
☆27Updated 3 months ago
Alternatives and similar repositories for jflux:
Users that are interested in jflux are comparing it to the libraries listed below
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆18Updated last week
- Focused on fast experimentation and simplicity☆65Updated last month
- ☆33Updated 4 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆29Updated 3 weeks ago
- supporting pytorch FSDP for optimizers☆75Updated last month
- Automatically take good care of your preemptible TPUs☆35Updated last year
- An implementation of the Llama architecture, to instruct and delight☆21Updated 2 weeks ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated last month
- This is a port of Mistral-7B model in JAX☆30Updated 6 months ago
- ☆27Updated 6 months ago
- Utilities for PyTorch distributed☆23Updated last year
- ☆19Updated 4 months ago
- Jax like function transformation engine but micro, microjax☆30Updated 3 months ago
- FID computation in Jax/Flax.☆26Updated 6 months ago
- Train vision models using JAX and 🤗 transformers☆97Updated this week
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated last month
- ☆22Updated 2 months ago
- ☆53Updated last year
- Experiment of using Tangent to autodiff triton☆74Updated last year
- Clean RL implementation using MLX☆28Updated 10 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆121Updated 9 months ago
- Collection of autoregressive model implementation☆77Updated 3 weeks ago
- ☆75Updated 6 months ago
- ☆53Updated 2 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆44Updated this week
- Latent Diffusion Language Models☆68Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 6 months ago
- ☆30Updated this week
- Implementation of Diffusion Transformers and Rectified Flow in Jax☆21Updated 6 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated last year