Laz4rz / RLLinks

☆16

Alternatives and similar repositories for RL

Users that are interested in RL are comparing it to the libraries listed below

Sorting:

tokenbender / avataRL
rl from zero pretrain, can it be done? we'll see.
☆56Updated this week
naklecha / llm-inference-optimizations-explained
in this repository, i'm going to implement increasingly complex llm inference optimizations
☆61Updated last month
kmohan321 / Research_Papers
☆46Updated 2 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆58Updated 4 months ago
smolorg / smoltropix
MLX port for xjdr's entropix sampler (mimics jax implementation)
☆64Updated 7 months ago
tyler-romero / microR1
Simple repository for training small reasoning models
☆33Updated 4 months ago
kubernetes-bad / reward-composer
Lego for GRPO
☆28Updated last month
brendanhogan / picoDeepResearch
☆63Updated last month
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆53Updated 4 months ago
okarthikb / state-space-models
☆27Updated 11 months ago
hkproj / multi-latent-attention
☆39Updated last month
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆78Updated 6 months ago
nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆46Updated 3 months ago
KhoomeiK / sanskrit-ocr
☆43Updated this week
xjdr-alt / llmri
look how they massacred my boy
☆63Updated 8 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆66Updated 2 months ago
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆72Updated this week
N8python / mlx-pretrain
A simple MLX implementation for pretraining LLMs on Apple Silicon.
☆80Updated last month
ariG23498 / gemma3-object-detection
Fine tune Gemma 3 on an object detection task
☆57Updated this week
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆41Updated last month
BBischof / yapping
Verbosity control for AI agents
☆63Updated last year
axeld5 / pali_reason
Testing paligemma2 finetuning on reasoning dataset
☆18Updated 6 months ago
thubZ09 / All-Things-Multimodal
Hub for researchers exploring VLMs and Multimodal Learning:)
☆39Updated this week
anpaure / cp_eval
Tiny evaluation of leading LLMs on competitive programming problems
☆14Updated 7 months ago
yash-srivastava19 / arrakis
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆29Updated 2 months ago
samefarrar / entropix_mlx
Modify Entropy Based Sampling to work with Mac Silicon via MLX
☆50Updated 7 months ago
SinatrasC / entropix
Entropy Based Sampling and Parallel CoT Decoding
☆17Updated 8 months ago
jxmorris12 / embzip
lossily compress representation vectors using product quantization
☆57Updated 2 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 2 months ago
xjdr-alt / muzero_sketch
☆38Updated 11 months ago