haizelabs / j1-microLinks

j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.

☆99

Alternatives and similar repositories for j1-micro

Users that are interested in j1-micro are comparing it to the libraries listed below

Sorting:

brendanhogan / picoDeepResearch
☆68Updated 6 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Updated last month
collinear-ai / spider
Streamline on-policy/off-policy distillation workflows in a few lines of code
☆65Updated last week
xjdr-alt / llmri
look how they massacred my boy
☆63Updated last year
xjdr-alt / muzero_sketch
☆40Updated last year
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆60Updated 6 months ago
PrimeIntellect-ai / genesys
☆136Updated 8 months ago
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆218Updated last month
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆107Updated 8 months ago
alexzhang13 / rlm
Super basic implementation (gist-like) of RLMs with REPL environments.
☆273Updated last month
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 9 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 10 months ago
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆34Updated 7 months ago
rosmineb / unit_test_rl
Project code for training LLMs to write better unit tests + code
☆21Updated 6 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆72Updated 7 months ago
tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆281Updated 2 months ago
allenai / infinigram-api
☆87Updated last week
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆134Updated 7 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆84Updated 8 months ago
SinatrasC / entropix-smollm
smolLM with Entropix sampler on pytorch
☆149Updated last year
arcee-ai / DAM
☆55Updated last year
Ziems / arbor
A framework for optimizing DSPy programs with RL
☆285Updated 2 weeks ago
thomasnormal / fewshot
☆29Updated last month
doomslide / hyperobject
Plotting (entropy, varentropy) for small LMs
☆99Updated 6 months ago
SinatrasC / entropix
Entropy Based Sampling and Parallel CoT Decoding
☆17Updated last year
changjonathanc / llmproc
LLMProc: Unix-inspired runtime that treats LLMs as processes.
☆34Updated 4 months ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆106Updated this week
Columbia-NLP-Lab / PAPILLON
Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles
☆60Updated 6 months ago
ScalingIntelligence / codemonkeys
☆59Updated 10 months ago
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆60Updated 2 months ago