zli12321/free-form-grpo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zli12321/free-form-grpo)

zli12321 / free-form-grpo

grpo to train long form QA and instructions with long-form reward model

☆17

Alternatives and similar repositories for free-form-grpo

Users that are interested in free-form-grpo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zli12321 / VideoHallu
View on GitHub
Synthetic Video hallucination and Mitigation
☆23Sep 21, 2025Updated 10 months ago
zli12321 / Vision-SR1
View on GitHub
Reinforcement Learning of Vision Language Models with Self Visual Perception Reward
☆175Mar 14, 2026Updated 4 months ago
zli12321 / MM-Zero
View on GitHub
Self-evolving vision language models from zero data
☆77Mar 14, 2026Updated 4 months ago
zli12321 / Vision-Language-Models-Overview
View on GitHub
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
☆670Jul 8, 2026Updated last week
zli12321 / FFGO-Video-Customization
View on GitHub
Video Content Customization Using First Frame
☆193Mar 17, 2026Updated 4 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
microsoft / ConstrainedReasoner
View on GitHub
☆13Aug 26, 2024Updated last year
edwinhu / workflows
View on GitHub
☆17Updated this week
chenyaofo / CCA-Attention
View on GitHub
☆20Aug 14, 2025Updated 11 months ago
m-hedderich / PyPremise
View on GitHub
PyPremise - Python tool for the Premise algorithm to identify patterns or explanations of where a machine learning classifier performs we…
☆22Oct 27, 2025Updated 8 months ago
SagnikMukherjee / sparsity_in_rl
View on GitHub
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
☆15Oct 20, 2025Updated 9 months ago
Hongyang-Du / awesome-3d-datasets
View on GitHub
[CVPRW'26] A collection and survey of 3d dataset
☆33Jun 4, 2026Updated last month
huiwang98 / DRL4Recsys
View on GitHub
Courses on Deep Reinforcement Learning (DRL) and DRL papers for recommender systems
☆13Jul 7, 2022Updated 4 years ago
shiweijiezero / R3L
View on GitHub
☆23Apr 5, 2026Updated 3 months ago
princeton-nlp / WhatICLLearns
View on GitHub
[ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning
☆21Jul 9, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
zhangxy-2019 / RetroAgent
View on GitHub
RETROAGENT: From Solving to Evolving via Retrospective Dual Intrinsic Feedback
☆26Mar 30, 2026Updated 3 months ago
zhjohnchan / awesome-disentanglement-in-nlp
View on GitHub
A curated list of disentanglement in NLP. :-)
☆17Oct 31, 2021Updated 4 years ago
princeton-nlp / unintentional-unalignment
View on GitHub
[ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
☆32Jan 7, 2026Updated 6 months ago
PKU-TANGENT / ConFiguRe
View on GitHub
Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"
☆12Jul 27, 2023Updated 2 years ago
pkunlp-icler / MLS
View on GitHub
Source code of our paper "Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation" @ ACL 2022
☆13Apr 13, 2022Updated 4 years ago
Trustworthy-Information-Access / LLM-Knowledge-Boundary-Perception-via-Internal-States
View on GitHub
Official code for the paper Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception. The code is based on t…
☆22Aug 5, 2025Updated 11 months ago
syncdoth / Chain-of-Hindsight-PyTorch
View on GitHub
Unofficial implementation of Chain of Hindsight (https://arxiv.org/abs/2302.02676) using pytorch and huggingface Trainers.
☆11Apr 5, 2023Updated 3 years ago
M3-IT / YING-VLM
View on GitHub
Vision Large Language Models trained on M3IT instruction tuning dataset
☆17Aug 16, 2023Updated 2 years ago
launchnlp / LitCab
View on GitHub
☆25Jun 10, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
chenllliang / ParetoMNMT
View on GitHub
Source code for paper "On the Pareto Front of Multilingual Neural Machine Translation" @ NeurIPS 2023
☆17Sep 27, 2023Updated 2 years ago
lifan-yuan / FactMix
View on GitHub
Code for COLING 2022 paper "FactMix: Using a Few Labeled In-domain Examples to Generalize to Cross-domain Named Entity Recognition"
☆15Jan 15, 2023Updated 3 years ago
StefanHeng / ProgGen
View on GitHub
Code for paper "ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models"
☆17Mar 29, 2024Updated 2 years ago
danzilberdan / devcontainers
View on GitHub
devcontainers features
☆23Oct 8, 2025Updated 9 months ago
run-llama / gemini-live-demo
View on GitHub
Demo code for Gemini Live Integration
☆13Jul 29, 2025Updated 11 months ago
JiaQiSJTU / FaithEval-FFLM
View on GitHub
A zero-shot faithfulness evaluation metric for text summarization
☆11Oct 17, 2023Updated 2 years ago
dqxiu / KAssess
View on GitHub
☆14Oct 28, 2023Updated 2 years ago
nju-websoft / EPR-KGQA
View on GitHub
Enhancing Complex Question Answering over Knowledge Graphs through Evidence Pattern Retrieval, WWW 2024
☆15Oct 22, 2024Updated last year
Wsky51 / TsinghuaJS
View on GitHub
为准备2020年清华机计算机复试机试题而做的笔记
☆11Apr 17, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ikergarcia1996 / Sequence-Labeling-LLMs
View on GitHub
The code to perform Sequence Labelling with LLMs, including T5, FLAN, LLaMA, Alpaca and more!
☆14Nov 5, 2024Updated last year
lichuang / ucore-codedump
View on GitHub
清华大学ucore项目代码注释版
☆12May 6, 2017Updated 9 years ago
victor7246 / gated-Transformer
View on GitHub
Gated Pretrained Transformer model for robust denoised sequence-to-sequence modelling
☆10May 29, 2021Updated 5 years ago
HanNight / AdaCAD
View on GitHub
Code for NAACL 2025 paper "AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge"
☆16Mar 2, 2026Updated 4 months ago
luka-group / lite
View on GitHub
This is the repository for the resources in TACL 2022 Paper "Ultra-fine Entity Typing with Indirect Supervision from Natural Language Inf…
☆14Aug 17, 2022Updated 3 years ago
shawnwun / woz
View on GitHub
The wizard of oz code used for collecting goal-oriented dialogue systems
☆13Oct 30, 2017Updated 8 years ago
emmyqin / iw_sft
View on GitHub
☆28Jul 18, 2025Updated last year