rosmineb / unit_test_rlLinks

Project code for training LLMs to write better unit tests + code

☆21

Alternatives and similar repositories for unit_test_rl

Users that are interested in unit_test_rl are comparing it to the libraries listed below

Sorting:

s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Updated 2 months ago
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 11 months ago
brendanhogan / picoDeepResearch
☆68Updated 7 months ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated last year
brendanhogan / completion_tree_view
☆15Updated 8 months ago
xjdr-alt / muzero_sketch
☆40Updated last year
Pleias / Quest-Best-Tokens
An introduction to LLM Sampling
☆79Updated last year
haizelabs / j1-micro
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆100Updated 5 months ago
anpaure / cp_eval
Tiny evaluation of leading LLMs on competitive programming problems
☆14Updated last year
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆107Updated 10 months ago
N8python / mlx-pretrain
A simple MLX implementation for pretraining LLMs on Apple Silicon.
☆85Updated 4 months ago
axeld5 / pali_reason
Testing paligemma2 finetuning on reasoning dataset
☆18Updated last year
smolorg / smoltropix
MLX port for xjdr's entropix sampler (mimics jax implementation)
☆61Updated last year
SinatrasC / entropix
Entropy Based Sampling and Parallel CoT Decoding
☆17Updated last year
AnswerDotAI / ModernBERT-Instruct-mini-cookbook
☆53Updated 10 months ago
teknium1 / transformers-gptq-quant
☆45Updated 2 years ago
collinear-ai / spider
Streamline on-policy/off-policy distillation workflows in a few lines of code
☆86Updated this week
huggingface / wikirace-llms
☆25Updated 8 months ago
kubernetes-bad / reward-composer
Lego for GRPO
☆30Updated 7 months ago
enjalot / latent-data-modal
Using modal.com to process FineWeb-edu data
☆20Updated 9 months ago
JoshuaPurtell / SmallBench
Small, simple agent task environments for training and evaluation
☆19Updated last year
reka-ai / rekaquant
☆62Updated 5 months ago
catid / lllm
Latent Large Language Models
☆19Updated last year
matthewrenze / jhu-concise-cot
The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models
☆22Updated last year
JD-P / RetroInstruct
Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.
☆32Updated 3 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆109Updated 10 months ago
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆61Updated 8 months ago
axolotl-ai-cloud / axolotl-cookbook
☆36Updated 5 months ago
SinatrasC / entropix-smollm
smolLM with Entropix sampler on pytorch
☆149Updated last year
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆73Updated 8 months ago