cgftinc / benchmaxLinks
Framework-Agnostic RL Environments for LLM Fine-Tuning
☆26Updated this week
Alternatives and similar repositories for benchmax
Users that are interested in benchmax are comparing it to the libraries listed below
Sorting:
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 5 months ago
- ☆53Updated 8 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆78Updated last week
- Nexusflow function call, tool use, and agent benchmarks.☆27Updated 7 months ago
- ☆50Updated last year
- Storing long contexts in tiny caches with self-study☆117Updated this week
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 8 months ago
- Lego for GRPO☆28Updated 2 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆81Updated 2 months ago
- entropix style sampling + GUI☆26Updated 9 months ago
- ☆64Updated 2 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆63Updated 11 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 6 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 3 months ago
- Small, simple agent task environments for training and evaluation☆18Updated 9 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last week
- KV Cache Steering for Inducing Reasoning in Small Language Models☆35Updated last week
- ☆34Updated 4 months ago
- Latent Large Language Models☆18Updated 11 months ago
- ☆21Updated last week
- ☆19Updated 11 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆45Updated 2 months ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- Train your own SOTA deductive reasoning model☆101Updated 4 months ago
- ☆58Updated 3 weeks ago
- ☆66Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated last year
- A Python library to orchestrate LLMs in a neural network-inspired structure☆49Updated 9 months ago
- Modified Beam Search with periodical restart☆12Updated 10 months ago