J-Rosser-UK / Torch2Jax-DeepSeek-R1-Distill-Qwen-1.5B

Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.

☆16

Alternatives and similar repositories for Torch2Jax-DeepSeek-R1-Distill-Qwen-1.5B:

Users that are interested in Torch2Jax-DeepSeek-R1-Distill-Qwen-1.5B are comparing it to the libraries listed below

Reytuag / transformerXL_PPO_JAX
☆74Updated 4 months ago
clement-bonnet / lpn
Latent Program Network (from the "Searching Latent Program Spaces" paper)
☆76Updated 3 weeks ago
luchris429 / JaxLife
An Open-Ended Agentic Simulator
☆45Updated 7 months ago
instadeepai / sebulba
🪐 The Sebulba architecture to scale reinforcement learning on Cloud TPUs in JAX
☆57Updated last year
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆126Updated this week
luchris429 / popjaxrl
Benchmarking RL for POMDPs in Pure JAX [Code for "Structured State Space Models for In-Context Reinforcement Learning" (NeurIPS 2023)]
☆99Updated last year
DramaCow / jaxued
☆75Updated last week
jax-ml / jax-llm-examples
☆87Updated 2 weeks ago
AlexGoldie / rl-learned-optimization
Official Implementation of "Can Learned Optimization Make Reinforcement Learning Less Difficult"
☆22Updated 4 months ago
facebookresearch / MRQ
MR.Q is a general-purpose model-free reinforcement learning algorithm.
☆80Updated last month
facebookresearch / minimax
Efficient baselines for autocurricula in JAX.
☆186Updated 7 months ago
instadeepai / flashbax
⚡ Flashbax: Accelerated Replay Buffers in JAX
☆229Updated this week
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆34Updated 5 months ago
mttga / purejaxql
Simple single-file baselines for Q-Learning in pure-GPU setting
☆150Updated 2 weeks ago
google-deepmind / nanodo
☆215Updated 8 months ago
keraJLi / rejax
☆193Updated 3 months ago
jenkspt / gpt-jax
Jax/Flax rewrite of Karpathy's nanoGPT
☆57Updated 2 years ago
MatX-inc / seqax
seqax = sequence modeling + JAX
☆151Updated 2 weeks ago
modula-systems / modula
🧱 Modula software package
☆187Updated this week
MichaelTMatthews / Jax2D
Highly scalable 2D JAX physics engine.
☆53Updated 3 weeks ago
vwxyzjn / cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
☆109Updated 7 months ago
epignatelli / navix
Accelerated minigrid environments with JAX
☆132Updated 8 months ago
tristandeleu / gfn-maxent-rl
Comparison between GFlowNets & Maximum Entropy RL
☆16Updated last year
imbue-ai / carbs
Cost aware hyperparameter tuning algorithm
☆149Updated 9 months ago
FLAIROx / jaxirl
Contains JAX implementation of algorithms for inverse reinforcement learning
☆71Updated 7 months ago
cgarciae / nanoGPT-jax
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆32Updated last year
EdanToledo / Stoix
🏛️A research-friendly codebase for fast experimentation of single-agent reinforcement learning in JAX • End-to-End JAX RL
☆304Updated this week
kvfrans / jax-diffusion-transformer
Implementation of Diffusion Transformer (DiT) in JAX
☆270Updated 9 months ago
MichalBortkiewicz / JaxGCRL
Goal-Conditioned Reinforcement Learning with JAX
☆137Updated this week
MichaelTMatthews / Craftax
(Crafter + NetHack) in JAX. ICML 2024 Spotlight.
☆295Updated last month