Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...
☆80Apr 30, 2025Updated last year
Alternatives and similar repositories for grpo-flat
Users that are interested in grpo-flat are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- From Llama to Deepseek, grpo/mtp implemented. With pt/sft/lora/qlora included☆30Apr 21, 2025Updated last year
- [NeurIPS 2023] LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object Recognition☆19May 26, 2024Updated 2 years ago
- ☆11Nov 18, 2023Updated 2 years ago
- ☆20Nov 3, 2024Updated last year
- code for paper Sparse Structure Search for Delta Tuning☆11Oct 16, 2022Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Functional Optimal Transport: Map Estimation and Domain Adaptation for Functional data☆27Jun 7, 2021Updated 5 years ago
- Official Code Repo for the paper "Learning to Play Atari in a World of Tokens" accepted at ICML, 2024☆11Jun 6, 2024Updated 2 years ago
- This is the source code of FUSION, a safety-aware causal representation for generalizable driving agents.☆26Oct 23, 2024Updated last year
- This is official project in our paper: Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers☆31Jan 13, 2024Updated 2 years ago
- ☆17Jun 10, 2025Updated last year
- [CHIL 2024] Interpretation of Intracardiac Electrograms Through Textual Representations☆12Sep 4, 2024Updated last year
- ☆48Feb 10, 2025Updated last year
- Ground-Aware Point Cloud Semantic Segmentation for Autonomous Driving. ACM Multimedia 2019.☆12Sep 19, 2019Updated 6 years ago
- Matrix Product State algorithm for computing characters of the symmetric group S_n☆11Sep 26, 2025Updated 8 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- D3PE (Deep Data-Driven Policy Evaluation) aims to evaluation a large set of candidate policies from a fixed dataset to select best ones.☆10Jun 2, 2022Updated 4 years ago
- ☆15Dec 16, 2021Updated 4 years ago
- 简单易理解的代码,用于在qwen上使用grpo加强数学能力☆57May 14, 2025Updated last year
- https://www.kaggle.com/c/siim-acr-pneumothorax-segmentation☆11Sep 11, 2019Updated 6 years ago
- ⚠️ ARCHIVED - All development moved to https://github.com/itbench-hub/ITBench/tree/main/scenarios☆15Feb 24, 2026Updated 3 months ago
- 基于Bart语言模型的指针生成网络,用于中文语法纠错任务☆16Sep 8, 2022Updated 3 years ago
- 基于bert4keras的SuperGLUE基准代码☆14Jun 25, 2022Updated 3 years ago
- Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision☆19Apr 1, 2025Updated last year
- ☆11Jun 14, 2019Updated 6 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This is the code for Q-value Path Decomposition for Deep Multiagent Reinforcement Learning (NeurIPS 2019).☆12May 20, 2019Updated 7 years ago
- ☆12Feb 6, 2021Updated 5 years ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆64Apr 4, 2025Updated last year
- Unity Release Notes archive for search across releases.☆16Jun 5, 2026Updated last week
- About Code release for "Imagination Mechanism: Mesh Information Propagation for Enhancing Data Efficiency in Reinforcement Learning"☆13Oct 7, 2023Updated 2 years ago
- Code tasks for NJU TSA (Time Series Analysis) course☆13Jan 24, 2024Updated 2 years ago
- ☆14Mar 7, 2022Updated 4 years ago
- [EMNLP-2025] R1-Zero on ANY TASK☆30Nov 9, 2025Updated 7 months ago
- ☆21Sep 12, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆13Jul 13, 2022Updated 3 years ago
- Here is my implementation of Center Loss with Keras☆11May 2, 2018Updated 8 years ago
- Explanation of the llama2 repo.☆12Jul 18, 2024Updated last year
- We design a spectral compression mapping (SCM) for full-band speech enhancement, and propose a two-stage stream named MHA-DPCRN☆24Jul 4, 2022Updated 3 years ago
- ☆14Mar 11, 2022Updated 4 years ago
- An open-source framework to benchmark and assess safety specifications of Reinforcement Learning problems.☆14Aug 25, 2023Updated 2 years ago
- ☆21Dec 24, 2024Updated last year