LeonGuertler / UnstableBaselinesLinks

☆116

Alternatives and similar repositories for UnstableBaselines

Users that are interested in UnstableBaselines are comparing it to the libraries listed below

Sorting:

tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆282Updated 3 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆332Updated 2 months ago
nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆45Updated 9 months ago
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆340Updated last week
PrimeIntellect-ai / prime-environments
Curated collection of community environments
☆196Updated last week
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆165Updated 8 months ago
Noumena-Network / nmoe
MoE training for Me and You and maybe other people
☆298Updated 2 weeks ago
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆228Updated 3 weeks ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆174Updated 11 months ago
METR / RE-Bench
☆126Updated 2 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆190Updated 9 months ago
PrimeIntellect-ai / genesys
☆136Updated 9 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆152Updated 10 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆181Updated 6 months ago
pyember / ember
☆234Updated 6 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆73Updated 8 months ago
google-deepmind / mishax
☆147Updated 3 months ago
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆248Updated last year
ekinakyurek / marc
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
☆341Updated last month
jerber / lang-jepa
☆131Updated last year
VsonicV / es-fine-tuning-paper
This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"
☆279Updated last month
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆141Updated last year
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆234Updated 5 months ago
magicproduct / hash-hop
Long context evaluation for large language models
☆224Updated 9 months ago
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆64Updated 7 months ago
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆219Updated 3 weeks ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆107Updated 9 months ago
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆251Updated 4 months ago
SalesforceAIResearch / LaTRO
☆125Updated 10 months ago
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆146Updated 8 months ago