This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"
☆353Feb 18, 2026Updated 2 months ago
Alternatives and similar repositories for es-fine-tuning-paper
Users that are interested in es-fine-tuning-paper are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆78Feb 18, 2026Updated 2 months ago
- ☆28Jul 18, 2025Updated 9 months ago
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆63Jan 5, 2026Updated 3 months ago
- Ludic – an LLM-RL library for the era of experience☆62Jan 9, 2026Updated 3 months ago
- Martingale posterior neural networks for fast sequential decision making @ Neurips 2025☆25Nov 13, 2025Updated 5 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval☆25Jun 28, 2025Updated 10 months ago
- Language modeling with linear-cost context☆118Sep 25, 2025Updated 7 months ago
- An implementation of the Pair Adjacent Violators algorithm for isotonic regression in Rust☆12Mar 25, 2026Updated last month
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆188Jul 23, 2025Updated 9 months ago
- ☆15Mar 25, 2024Updated 2 years ago
- ☆148Sep 29, 2025Updated 7 months ago
- Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML☆59Dec 12, 2025Updated 4 months ago
- ☆548Mar 30, 2026Updated last month
- Project code for training LLMs to write better unit tests + code☆21May 19, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for paper "Analog Foundation Models"☆33Mar 25, 2026Updated last month
- Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""☆33Oct 12, 2025Updated 6 months ago
- R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning☆39Feb 9, 2026Updated 2 months ago
- Verlog: A Multi-turn RL framework for LLM agents☆74Updated this week
- Vanilla-Python ergonomics on top of DSPy☆40Jun 3, 2025Updated 11 months ago
- Implementation of SOAR☆52Sep 17, 2025Updated 7 months ago
- Agentic RL Training at Scale☆1,323Updated this week
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]☆64Apr 11, 2026Updated 3 weeks ago
- ☆28Jan 8, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICLR2026] Test-Time Scaling with Reflective Generative Model☆302Jan 28, 2026Updated 3 months ago
- Website for Princeton's undergraduate reinforcement learning course☆15May 12, 2025Updated 11 months ago
- Universal LLM security auditor with automated jailbreak testing, DSPy optimization, and OWASP 2025-aligned attack patterns☆21Oct 23, 2025Updated 6 months ago
- Resa: Transparent Reasoning Models via SAEs☆48Sep 23, 2025Updated 7 months ago
- RLM for coding agent☆103Feb 19, 2026Updated 2 months ago
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆49Feb 4, 2026Updated 2 months ago
- Code to estimate DunedinPACNI scores from FreeSurfer parcellations of brain MRI data.☆43Sep 20, 2025Updated 7 months ago
- Auto-resize oversized images and repair corrupted sessions in Claude Code☆26Dec 8, 2025Updated 4 months ago
- ☆13Apr 5, 2023Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Simple Scalable Discrete Diffusion for text in PyTorch☆37Sep 27, 2024Updated last year
- Universal Reasoning Model☆128Jan 15, 2026Updated 3 months ago
- ☆26Mar 4, 2026Updated last month
- Training framework with a goal to explore the frontier of sample efficiency of small language models☆99Jan 25, 2026Updated 3 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆362Jun 23, 2025Updated 10 months ago
- Scaling RL on advanced reasoning models☆675Oct 20, 2025Updated 6 months ago
- ☆20Mar 25, 2025Updated last year