mkurman / grpo-llm-evaluatorView external linksLinks
Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluations.
☆51May 7, 2025Updated 9 months ago
Alternatives and similar repositories for grpo-llm-evaluator
Users that are interested in grpo-llm-evaluator are comparing it to the libraries listed below
Sorting:
- ☆19Mar 10, 2025Updated 11 months ago
- coded with and corrected by Google Anti-Gravity☆13Nov 23, 2025Updated 2 months ago
- ☆15Apr 26, 2025Updated 9 months ago
- this is based on the paper Chain-of-Retrieval Augmented Generation☆14Mar 29, 2025Updated 10 months ago
- ☆17Feb 1, 2024Updated 2 years ago
- NanoGPT (124M) in 5 minutes☆14Feb 14, 2025Updated last year
- Interpreting Learned Search and Planning: Reverse-engineering recurrent convolutional networks (DRC) that play Sokoban☆17Jun 29, 2025Updated 7 months ago
- ☆16Jan 26, 2025Updated last year
- [ACL 2025] Knowledge Unlearning for Large Language Models☆48Sep 18, 2025Updated 4 months ago
- Official Project Page for HLA: Higher-order Linear Attention (https://arxiv.org/abs/2510.27258)☆44Jan 6, 2026Updated last month
- Generating Easy-to-Understand Referring Expressions for Target Identifications☆18Aug 30, 2019Updated 6 years ago
- Explore training for quantized models☆26Jul 12, 2025Updated 7 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Jan 7, 2026Updated last month
- ☆20Aug 1, 2024Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Oct 18, 2025Updated 3 months ago
- ☆24Jan 22, 2025Updated last year
- ☆30Mar 11, 2025Updated 11 months ago
- minimal pytorch implementation of bm25 (with sparse tensors)☆104Oct 28, 2025Updated 3 months ago
- Vietnamese long form question answering system with documents retrieval.☆21Mar 28, 2024Updated last year
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Feb 10, 2026Updated last week
- Lego for GRPO☆30May 27, 2025Updated 8 months ago
- Exploring Applications of GRPO☆250Aug 25, 2025Updated 5 months ago
- Software Engineering Back End Microservices Project☆15Nov 20, 2024Updated last year
- User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…☆28May 3, 2025Updated 9 months ago
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Mar 1, 2024Updated last year
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆36Jun 20, 2025Updated 7 months ago
- ☆48Aug 12, 2025Updated 6 months ago
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆47Jul 18, 2025Updated 6 months ago
- [AAAI26] LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs☆52Dec 7, 2025Updated 2 months ago
- Tiny Agent: Production-Ready LLM Agent SDK for Every Developer☆35Sep 29, 2025Updated 4 months ago
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More☆34May 17, 2025Updated 8 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆40Aug 7, 2025Updated 6 months ago
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆46Sep 2, 2025Updated 5 months ago
- An introduction to LLM Sampling☆79Dec 15, 2024Updated last year
- Support for training SSD on TF2☆12Mar 29, 2023Updated 2 years ago
- Use MobileNet SSD and openCV to detect and count car on road☆12Jan 13, 2020Updated 6 years ago
- In this repo, I developed a step-by-step pipeline for a standard MultiSpeaker Text-to-Speech system In general, I used Portaspeech as an…☆12Nov 24, 2023Updated 2 years ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆87Mar 23, 2025Updated 10 months ago
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆42Sep 23, 2023Updated 2 years ago