zhourunlong/Reflect-RL

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhourunlong/Reflect-RL)

zhourunlong / Reflect-RL

Reflect-RL: Two-Player Online RL Fine-Tuning for LMs

☆18

Alternatives and similar repositories for Reflect-RL

Users that are interested in Reflect-RL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yanxue7 / E3T-Overcooked
View on GitHub
☆15May 4, 2024Updated 2 years ago
ryoungj / BoLT
View on GitHub
Code for "Reasoning to Learn from Latent Thoughts"
☆134Mar 28, 2025Updated last year
Linzwcs / AFT
View on GitHub
☆13Jan 22, 2025Updated last year
limenlp / safer-instruct
View on GitHub
This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Feb 22, 2024Updated 2 years ago
EIT-NLP / UniToolCall
View on GitHub
☆21Jul 12, 2026Updated last week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Stanford-Sustainable-Systems-Lab / speech-grid-impact
View on GitHub
☆13Aug 29, 2022Updated 3 years ago
guyuntian / CoT_benchmark
View on GitHub
Code for "Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective"
☆21Jul 16, 2023Updated 3 years ago
DanieleGammelli / graph-rl-for-network-optimization
View on GitHub
☆16Jan 26, 2023Updated 3 years ago
axeld5 / pali_reason
View on GitHub
Testing paligemma2 finetuning on reasoning dataset
☆18Dec 28, 2024Updated last year
facebookresearch / taskmet
View on GitHub
TaskMet Task-driven Metric Learning for Model Learning
☆21Feb 9, 2024Updated 2 years ago
jyao97 / RL-for-MSRs
View on GitHub
[AISY 2023] An implementation of using rl to control magnetic soft robots.
☆10Jul 29, 2024Updated last year
INFERLab / PROF
View on GitHub
☆12Sep 15, 2021Updated 4 years ago
jihwan-jeong / xaddpy
View on GitHub
☆12Mar 14, 2024Updated 2 years ago
cvxpy / cvxtorch
View on GitHub
Convert CVXPY expressions to PyTorch expressions
☆18Jul 8, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
LOTUS-Lab-at-VT / DD-SAA
View on GitHub
Codes for the paper "Data-Driven Sample Average Approximation with Covariate Information"
☆13Aug 13, 2022Updated 3 years ago
XiongPengNUS / PandaShifu
View on GitHub
☆18May 14, 2026Updated 2 months ago
RUCAIBox / ChainLM
View on GitHub
☆31Mar 23, 2024Updated 2 years ago
tijana-zrnic / cross-ppi
View on GitHub
Cross-prediction-powered inference
☆15Apr 26, 2024Updated 2 years ago
wesg52 / sindy_mio_paper
View on GitHub
Code for Learning Sparse Nonlinear Dynamics via Mixed Integer Optimization
☆16Jun 13, 2022Updated 4 years ago
INFERLab / COHORT
View on GitHub
COHORT: Coordination of Heterogeneous Thermostatically Controlled Loads for Demand Flexibility
☆14Feb 3, 2021Updated 5 years ago
jeffreysijuntan / lloco
View on GitHub
The official repo for "LLoCo: Learning Long Contexts Offline"
☆119Jun 15, 2024Updated 2 years ago
Lingkai-Kong / so-ebm
View on GitHub
Code for paper: End-to-end Stochastic Optimization with Energy-based Model
☆16Feb 14, 2023Updated 3 years ago
csguoh / OBR
View on GitHub
[ICLR2026] The first W4A4KV4 quantized + 50% sparse LLMs!
☆33Jan 26, 2026Updated 5 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
RUCKBReasoning / CoT-based-Synthesizer
View on GitHub
Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'
☆32May 19, 2025Updated last year
yliu-cs / PiTe
View on GitHub
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Feb 13, 2025Updated last year
Harry67Hu / CORY
View on GitHub
Official implementation of the NeurIPS 2024 paper CORY
☆33Mar 4, 2026Updated 4 months ago
polixir / morec
View on GitHub
☆10Mar 11, 2024Updated 2 years ago
facebookresearch / SurCo
View on GitHub
Repo for ICML'23 paper SurCo Learning Linear Surrogates For Combinatorial Nonlinear Optimization Problems
☆18Jul 11, 2023Updated 3 years ago
Raj-08 / Q-Flow
View on GitHub
Complete Reinforcement Learning Toolkit for Large Language Models!
☆21Aug 2, 2025Updated 11 months ago
dinobby / Skill-MoE
View on GitHub
The code implementation of Skill-MoE
☆46May 22, 2026Updated 2 months ago
stellatogrp / l2ws
View on GitHub
☆17Jul 15, 2026Updated last week
jypark0 / bmil
View on GitHub
☆11Nov 1, 2022Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
nuomizai / T2VLM
View on GitHub
[ICCV'25] T2 -VLM: Training-Free Generation of Temporally Consistent Rewards from VLMs
☆16Jul 8, 2025Updated last year
stellatogrp / cvxro
View on GitHub
Convex Optimization under Uncertainty
☆33Mar 31, 2026Updated 3 months ago
vivoutlaw / tcbp
View on GitHub
Temporal Compact Bilinear Pooling (TCBP)
☆11May 27, 2020Updated 6 years ago
JayMan91 / NeurIPSIntopt
View on GitHub
Implementation of "Interior Point Solving for LP-based prediction+optimisation" paper in Neurips 2020.
☆21May 16, 2024Updated 2 years ago
EIT-NLP / BLEUless_DocMT
View on GitHub
☆14Nov 19, 2024Updated last year
JunShern / few-shot-adaptation
View on GitHub
Exploring Few-Shot Adaptation of Language Models with Tables
☆25Aug 22, 2022Updated 3 years ago
WEIRDLabUW / sgft
View on GitHub
☆22Feb 6, 2025Updated last year