jrosseruk / Torch2Jax-DeepSeek-R1-Distill-Qwen-1.5BLinks

Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.

☆22

Alternatives and similar repositories for Torch2Jax-DeepSeek-R1-Distill-Qwen-1.5B

Users that are interested in Torch2Jax-DeepSeek-R1-Distill-Qwen-1.5B are comparing it to the libraries listed below

Sorting:

young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated 2 weeks ago
clement-bonnet / lpn
Latent Program Network (from the "Searching Latent Program Spaces" paper)
☆96Updated 6 months ago
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆158Updated last week
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆162Updated 2 months ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆167Updated last month
kvfrans / lmpo
☆108Updated last week
erfanzar / jax-flash-attn2
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/…
☆27Updated 6 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆129Updated 9 months ago
google-deepmind / nanodo
☆281Updated last year
erfanzar / EasyDeL
Accelerate, Optimize performance with streamlined training and serving options with JAX.
☆310Updated this week
young-geng / scalax
A simple library for scaling up JAX programs
☆143Updated 10 months ago
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated last year
imbue-ai / carbs
Cost aware hyperparameter tuning algorithm
☆168Updated last year
Jaykef / Triton-nanoGPT
Custom triton kernels for training Karpathy's nanoGPT.
☆19Updated 11 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆225Updated 2 months ago
davisyoshida / lorax
LoRA for arbitrary JAX models and functions
☆142Updated last year
modula-systems / modula
🧱 Modula software package
☆237Updated last month
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆107Updated 4 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆188Updated 10 months ago
google / tunix
A JAX-native LLM Post-Training Library
☆144Updated this week
LeonGuertler / UnstableBaselines
☆101Updated this week
radarFudan / mamba-minimal-jax
☆34Updated 9 months ago
jenkspt / gpt-jax
Jax/Flax rewrite of Karpathy's nanoGPT
☆60Updated 2 years ago
kvfrans / jax-diffusion-transformer
Implementation of Diffusion Transformer (DiT) in JAX
☆291Updated last year
yixiaoer / einshard
Einsum-like high-level array sharding API for JAX
☆35Updated last year
martin-marek / batch-size
📄Small Batch Size Training for Language Models
☆62Updated 3 weeks ago
AllanYangZhou / midGPT
Distributed pretraining of large language models (LLMs) on cloud TPU slices, with Jax and Equinox.
☆24Updated 11 months ago
alxndrTL / othello_mamba
Evaluating the Mamba architecture on the Othello game
☆48Updated last year
google-deepmind / mishax
☆142Updated last week
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated 9 months ago