VsonicV / es-fine-tuning-paperLinks
This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"
☆229Updated this week
Alternatives and similar repositories for es-fine-tuning-paper
Users that are interested in es-fine-tuning-paper are comparing it to the libraries listed below
Sorting:
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 9 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆103Updated 3 weeks ago
- EvaByte: Efficient Byte-level Language Models at Scale☆110Updated 6 months ago
- Official repo of paper LM2☆47Updated 8 months ago
- RLP: Reinforcement as a Pretraining Objective☆192Updated 3 weeks ago
- ☆149Updated 2 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆297Updated 2 months ago
- ☆122Updated 8 months ago
- ☆93Updated 4 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆116Updated this week
- DeMo: Decoupled Momentum Optimization☆194Updated 10 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆233Updated 3 months ago
- Storing long contexts in tiny caches with self-study☆205Updated last week
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆129Updated 2 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 6 months ago
- ☆102Updated 3 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 8 months ago
- Library for text-to-text regression, applicable to any input string representation and allows pretraining and fine-tuning over multiple r…☆277Updated last week
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆260Updated last week
- An AI benchmark for creative, human-like problem solving using Sudoku variants☆105Updated 3 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 10 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆107Updated 7 months ago
- Exploring Applications of GRPO☆248Updated 2 months ago
- accompanying material for sleep-time compute paper☆117Updated 6 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆124Updated 5 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆347Updated 4 months ago
- PyTorch implementation of models from the Zamba2 series.☆185Updated 9 months ago
- 📄Small Batch Size Training for Language Models☆63Updated 3 weeks ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆188Updated 7 months ago
- ☆124Updated 10 months ago