kubernetes-bad / reward-composer
Lego for GRPO
☆25Updated last week
Alternatives and similar repositories for reward-composer:
Users that are interested in reward-composer are comparing it to the libraries listed below
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆39Updated last month
- Simple GRPO scripts and configurations.☆58Updated last month
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆91Updated 2 weeks ago
- Train your own SOTA deductive reasoning model☆81Updated 2 weeks ago
- look how they massacred my boy☆63Updated 5 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆52Updated last week
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆63Updated 4 months ago
- ☆38Updated 7 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆138Updated last month
- Clue inspired puzzles for testing LLM deduction abilities☆31Updated this week
- Cerule - A Tiny Mighty Vision Model☆67Updated 6 months ago
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- entropix style sampling + GUI☆25Updated 4 months ago
- ☆48Updated 4 months ago
- [WIP] Transformer to embed Danbooru labelsets☆13Updated 11 months ago
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆39Updated 3 weeks ago
- ☆20Updated 4 months ago
- smolLM with Entropix sampler on pytorch☆150Updated 4 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 4 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆85Updated this week
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆31Updated 3 weeks ago
- ☆66Updated 10 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆21Updated 4 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆26Updated 2 weeks ago
- Modify Entropy Based Sampling to work with Mac Silicon via MLX☆50Updated 4 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆59Updated 7 months ago
- Collection of autoregressive model implementation☆83Updated last month
- tiny_fnc_engine is a minimal python library that provides a flexible engine for calling functions extracted from a LLM.☆38Updated 6 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- ☆38Updated last month