ch33nchan / RLlamaLinks

☆15

Alternatives and similar repositories for RLlama

Users that are interested in RLlama are comparing it to the libraries listed below

Sorting:

nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆46Updated 3 months ago
tokenbender / avataRL
rl from zero pretrain, can it be done? we'll see.
☆56Updated this week
tyler-romero / microR1
Simple repository for training small reasoning models
☆33Updated 4 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆94Updated 3 months ago
naklecha / llm-inference-optimizations-explained
in this repository, i'm going to implement increasingly complex llm inference optimizations
☆60Updated last month
hkproj / multi-latent-attention
☆39Updated last month
okarthikb / state-space-models
☆27Updated 11 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆68Updated 3 months ago
brendanhogan / picoDeepResearch
☆63Updated last month
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆101Updated 3 months ago
PrimeIntellect-ai / genesys
☆127Updated 3 months ago
jxmorris12 / embzip
lossily compress representation vectors using product quantization
☆57Updated 2 months ago
willccbb / trl
Train transformer language models with reinforcement learning.
☆19Updated 4 months ago
N8python / mlx-pretrain
A simple MLX implementation for pretraining LLMs on Apple Silicon.
☆80Updated last month
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆58Updated 4 months ago
xjdr-alt / muzero_sketch
☆38Updated 11 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆66Updated 2 months ago
kubernetes-bad / reward-composer
Lego for GRPO
☆28Updated last month
facebookresearch / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆31Updated 2 months ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆41Updated last month
google-deepmind / mishax
☆134Updated 2 months ago
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆45Updated 2 months ago
saurabhaloneai / Llama-3-From-Scratch-In-Pure-Jax
This repository contain the simple llama3 implementation in pure jax.
☆66Updated 4 months ago
allenai / infinigram-api
☆61Updated 3 weeks ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆53Updated 4 months ago
charlesfrye / cuda-substrings
Because it's there.
☆16Updated 9 months ago
smolorg / smoltropix
MLX port for xjdr's entropix sampler (mimics jax implementation)
☆64Updated 7 months ago
open-thought / reasoning-gym-eval
Collection of LLM completions for reasoning-gym task datasets
☆24Updated last month
jxmorris12 / cde
code for training & evaluating Contextual Document Embedding models
☆195Updated last month
xjdr-alt / llmri
look how they massacred my boy
☆63Updated 8 months ago