OpenPipe / rl-experimentsLinks

OpenPipe Reinforcement Learning Experiments

☆28

Alternatives and similar repositories for rl-experiments

Users that are interested in rl-experiments are comparing it to the libraries listed below

Sorting:

catena-labs / moa-llm
A Python library to orchestrate LLMs in a neural network-inspired structure
☆49Updated 9 months ago
severian42 / Computational-Model-for-Symbolic-Representations
Glyphs, acting as collaboratively defined symbols linking related concepts, add a layer of multidimensional semantic richness to user-AI …
☆49Updated 5 months ago
reka-ai / rekaquant
☆58Updated 3 weeks ago
attashe / ModifiedBeamSampler
Modified Beam Search with periodical restart
☆12Updated 10 months ago
kubernetes-bad / reward-composer
Lego for GRPO
☆28Updated 2 months ago
Cerebras / DocChat
GPT-4 Level Conversational QA Trained In a Few Hours
☆63Updated 11 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆55Updated 5 months ago
EduardTalianu / EntropixLab
entropix style sampling + GUI
☆26Updated 9 months ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 6 months ago
bradhilton / temporal-clue
Clue inspired puzzles for testing LLM deduction abilities
☆38Updated 4 months ago
AtakanTekparmak / agento
Very minimal (and stateless) agent framework
☆44Updated 6 months ago
desik1998 / MathWithLLMs
☆49Updated last year
diicellman / dynamite-dogs
BH hackathon
☆14Updated last year
brendanhogan / completion_tree_view
☆13Updated 3 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆101Updated 4 months ago
phunterlau / paper_without_code
LLM reads a paper and produce a working prototype
☆58Updated 3 months ago
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆140Updated 5 months ago
Doriandarko / MLX-GRPO
A pure MLX-based training pipeline for fine-tuning LLMs using GRPO on Apple Silicon.
☆42Updated 6 months ago
broskicodes / chess-position-embeddings
code for training and using chess embeddings models
☆12Updated last year
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 5 months ago
andrewginns / CoT-at-Home
Who needs o1 anyways. Add CoT to any OpenAI compatible endpoint.
☆43Updated 10 months ago
Glavin001 / Data2AITextbook
🚀 Automatically convert unstructured data into a high-quality 'textbook' format, optimized for fine-tuning Large Language Models (LLMs)
☆25Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆81Updated 2 months ago
Alignment-Lab-AI / KnowledgeBase
never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…
☆37Updated last year
axolotl-ai-cloud / axolotl-cookbook
☆34Updated 4 months ago
matthewrenze / jhu-concise-cot
The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models
☆22Updated 8 months ago
ritabratamaiti / AnyModal
AnyModal is a Flexible Multimodal Language Model Framework for PyTorch
☆101Updated 7 months ago
arcee-ai / DAM
☆53Updated 8 months ago
raphaelmansuy / iteration_of_tought
Example implementation of Iteration of Tought - Gives a star if you like the project
☆42Updated 7 months ago
ElleLeonne / Lightning-ReLoRA
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆33Updated last year