tokenbender / avataRLLinks

rl from zero pretrain, can it be done? yes.

☆277

Alternatives and similar repositories for avataRL

Users that are interested in avataRL are comparing it to the libraries listed below

Sorting:

LeonGuertler / UnstableBaselines
☆105Updated this week
PrimeIntellect-ai / prime-environments
Training-Ready RL Environments + Evals
☆132Updated this week
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆248Updated 2 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 9 months ago
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆201Updated last week
nano-R1 / resources
Compiling useful links, papers, benchmarks, ideas, etc.
☆45Updated 7 months ago
PrimeIntellect-ai / prime-rl
Async RL Training at Scale
☆722Updated this week
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆163Updated 6 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆291Updated 2 weeks ago
PrimeIntellect-ai / genesys
☆135Updated 7 months ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆297Updated 2 months ago
eqimp / hogwild_llm
Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache
☆127Updated 2 months ago
jerber / lang-jepa
☆124Updated 10 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆108Updated 7 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆79Updated 7 months ago
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆132Updated 5 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆342Updated 10 months ago
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆260Updated this week
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆107Updated 7 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆187Updated 7 months ago
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆299Updated last month
Danau5tin / calculator_agent_rl
Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.
☆57Updated 5 months ago
ekinakyurek / marc
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
☆330Updated 11 months ago
magicproduct / hash-hop
Long context evaluation for large language models
☆224Updated 7 months ago
groundlight / r1_vlm
Build your own visual reasoning model
☆413Updated 2 weeks ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆72Updated 6 months ago
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆139Updated last year
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆103Updated 2 weeks ago
SalesforceAIResearch / LaTRO
☆122Updated 8 months ago
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆233Updated 3 months ago