evanatyourservice/llm-jax

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/evanatyourservice/llm-jax)

evanatyourservice / llm-jax

Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.

☆19

Alternatives and similar repositories for llm-jax

Users that are interested in llm-jax are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JesseFarebro / flax-mup
View on GitHub
Maximal Update Parametrization (μP) with Flax & Optax.
☆16Dec 27, 2023Updated 2 years ago
young-geng / mintext
View on GitHub
Minimal but scalable implementation of large language models in JAX
☆34Nov 28, 2025Updated 8 months ago
affjljoo3581 / starcoder-jax
View on GitHub
a Jax/Flax inference code of StarCoder
☆12Jun 12, 2023Updated 3 years ago
cloneofsimo / repa-rf
View on GitHub
☆32Nov 4, 2024Updated last year
matttreed / diloco-sim
View on GitHub
☆23Jan 5, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ethansmith2000 / fsdp_optimizers
View on GitHub
supporting pytorch FSDP for optimizers
☆84Dec 8, 2024Updated last year
waefrebeorn / KAN-WuBu-Memory
View on GitHub
An AI character interaction system with emotional modeling and advanced memory management
☆17Oct 26, 2024Updated last year
RE-N-Y / sae
View on GitHub
☆21Nov 18, 2024Updated last year
erfanzar / jax-flash-attn2
View on GitHub
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/…
☆34Mar 4, 2025Updated last year
SonicCodes / subcloning
View on GitHub
implementation of https://arxiv.org/pdf/2312.09299
☆21Jul 3, 2024Updated 2 years ago
cgarciae / simple_flow_matching
View on GitHub
☆23Dec 16, 2024Updated last year
erfanzar / eformer
View on GitHub
(EasyDel Former) is a utility library designed to simplify and enhance the development in JAX
☆33Updated this week
lixilinx / psgd_torch
View on GitHub
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆199May 30, 2026Updated last month
Sike-Wang / low-bit-Shampoo
View on GitHub
4-bit Shampoo for Memory-Efficient Network Training (NeurIPS 2024)
☆13Feb 13, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HomebrewML / HeavyBall
View on GitHub
Efficient optimizers
☆336Updated this week
ethansmith2000 / TransformerExperiments
View on GitHub
☆19Dec 4, 2025Updated 7 months ago
opooladz / Preconditioned-Stochastic-Gradient-Descent
View on GitHub
A repo based on XiLin Li's PSGD repo that extends some of the experiments.
☆14Oct 7, 2024Updated last year
pharaouk / dharma
View on GitHub
☆13Apr 25, 2024Updated 2 years ago
yixiaoer / tpux
View on GitHub
A set of Python scripts that makes your experience on TPU better
☆56Sep 18, 2025Updated 10 months ago
fattorib / Flax-ResNets
View on GitHub
CIFAR10 ResNets implemented in JAX+Flax
☆12Apr 6, 2022Updated 4 years ago
qiray / MathArtist
View on GitHub
Tool for generating pictures using mathematical formulas.
☆44Nov 8, 2021Updated 4 years ago
joey00072 / microjax
View on GitHub
Jax like function transformation engine but micro, microjax
☆34Oct 25, 2024Updated last year
NMS05 / DinoV2-BERT-CLIP
View on GitHub
A simple PyTorch implementation of CLIP model using DinoV2 and BERT
☆16Sep 26, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
thomasahle / kanmlps
View on GitHub
KANs and MLPs
☆12Jun 7, 2024Updated 2 years ago
lixilinx / Fully-Trainable-SSM
View on GitHub
A fully trainable state space model (SSM)
☆16Mar 18, 2025Updated last year
ucsb-seclab / BullseyePoison
View on GitHub
Bullseye Polytope Clean-Label Poisoning Attack
☆18Nov 5, 2020Updated 5 years ago
KellerJordan / hlb-CIFAR10
View on GitHub
Train to 94% on CIFAR-10 in 4.4 seconds on a single A100
☆12Dec 30, 2023Updated 2 years ago
ridgesai / ridges-old
View on GitHub
☆12May 30, 2025Updated last year
evanatyourservice / kron_torch
View on GitHub
An implementation of PSGD Kron second-order optimizer for PyTorch
☆102Jul 24, 2025Updated last year
bhneo / decorrelated_bn
View on GitHub
An implementation of DecorrelatedBN by tensorflow
☆13Jun 30, 2022Updated 4 years ago
YuchenJin / llm.c
View on GitHub
LLM training in simple, raw C/CUDA
☆15Dec 5, 2024Updated last year
jiangycTarheel / SQ-Transformer
View on GitHub
☆10Feb 12, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
GallagherCommaJack / modulax
View on GitHub
☆18Aug 24, 2024Updated last year
allenai / sso
View on GitHub
Repository for Skill Set Optimization
☆14Jul 26, 2024Updated 2 years ago
nreimers / se-pytorch-xla
View on GitHub
☆21Sep 6, 2021Updated 4 years ago
zafstojano / wordgamebench
View on GitHub
Evaluating language models on word puzzle games
☆10Oct 25, 2024Updated last year
NKI-AI / ece_loss
View on GitHub
A PyTorch implementation of the Exclusive Cross Entropy Loss.
☆20Aug 12, 2022Updated 3 years ago
google / jaxonnxruntime
View on GitHub
A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.
☆135Jul 6, 2026Updated 3 weeks ago
ariellubonja / orthogonal-matching-pursuit-gpu
View on GitHub
Orthogonal Matching Pursuit, parallelized on both CPU and GPU. 100x+ Speedup
☆17Apr 24, 2026Updated 3 months ago