minimal GRPO implementation from scratch
☆103Mar 14, 2025Updated last year
Alternatives and similar repositories for Tiny-GRPO
Users that are interested in Tiny-GRPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆15Feb 26, 2025Updated last year
- Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision☆19Apr 1, 2025Updated last year
- ☆16Aug 7, 2024Updated last year
- Minimal hackable GRPO implementation☆340Jan 31, 2025Updated last year
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆16Jul 8, 2024Updated last year
- Code and data to reproduce the transfer of a learned posterior on one time series as a new prior on a related time series to model yearly…☆15Jun 16, 2019Updated 6 years ago
- A very simple GRPO implement for reproducing r1-like LLM thinking.☆1,685Nov 21, 2025Updated 6 months ago
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆58Apr 6, 2025Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆98Dec 17, 2024Updated last year
- ☆10Jul 8, 2021Updated 4 years ago
- ☆34Nov 18, 2025Updated 6 months ago
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆54May 7, 2025Updated last year
- Code for "What really matters in matrix-whitening optimizers?"☆24Oct 31, 2025Updated 7 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本☆49Aug 27, 2023Updated 2 years ago
- The official repository for the NLP-KG web application [ACL 2024 Demo].☆14Oct 16, 2025Updated 7 months ago
- Visualize any repo or codebase into diagram or animation☆24Oct 14, 2024Updated last year
- ☆16Oct 17, 2025Updated 7 months ago
- ☆17Aug 1, 2025Updated 10 months ago
- PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration☆45Jan 7, 2026Updated 5 months ago
- Experiments for our CLEAR benchmark of unlearning methods in a multimodal setup☆23Aug 6, 2025Updated 10 months ago
- Apache Arrow-compatible space-efficient "tape" class in pure Rust to be used with StringZilla for GPU, NUMA, and disk transfers of variab…☆31Nov 21, 2025Updated 6 months ago
- ☆10Jun 8, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated last year
- unofficial pytorch implementation of HiFi-GAN with fast MISR.☆15Mar 21, 2023Updated 3 years ago
- Quick access to any large language model from your browser.☆10Feb 16, 2026Updated 3 months ago
- SCoRe: Training Language Models to Self-Correct via Reinforcement Learning☆16May 14, 2026Updated 3 weeks ago
- Chunk Dedupe Estimation☆20Nov 5, 2024Updated last year
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…☆19Dec 17, 2025Updated 5 months ago
- Unofficial PyTorch implementation of DALL-E 2 by OpenAI☆10Apr 6, 2022Updated 4 years ago
- Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasonin…☆84Sep 8, 2025Updated 9 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆230May 31, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆44Sep 19, 2024Updated last year
- ☆17May 15, 2025Updated last year
- ☆53Oct 29, 2024Updated last year
- Code for D. Matthews, S. Kriegman, C. Cappelle and J. Bongard, "Word2vec to behavior: morphology facilitates the grounding of language in…☆15Apr 2, 2020Updated 6 years ago
- Minute-long video generation at 24FPS.☆68Mar 28, 2026Updated 2 months ago
- Repository of the paper ''CritiQ: Mining Data Quality Criteria from Human Preferences". Code for CritiQ Flow & Training CritiQ Scorer.☆23Dec 11, 2025Updated 6 months ago
- Awesome Reasoning LLM Tutorial/Survey/Guide☆2,433Apr 6, 2026Updated 2 months ago