uservan/ThinkPO

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/uservan/ThinkPO)

uservan / ThinkPO

☆17

Alternatives and similar repositories for ThinkPO

Users that are interested in ThinkPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

slime-n / slime-n
View on GitHub
A Multi-Policy, Multi-Agent RL Training Framework
☆30Jun 16, 2026Updated last month
IBM / ColPret
View on GitHub
Efficient Scaling laws and collaborative pretraining.
☆22Updated this week
psunlpgroup / FoVer
View on GitHub
This repository includes code and materials for the paper "Efficient PRM Training Data Synthesis via Formal Verification" (ACL 2026 Findi…
☆18Apr 7, 2026Updated 3 months ago
Jiahao004 / DeepTheorem
View on GitHub
☆26Jun 10, 2025Updated last year
StigLidu / TURN
View on GitHub
[ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"
☆23Feb 16, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
shenao-zhang / reward-augmented-preference
View on GitHub
The official implementation of Preference Data Reward-Augmentation.
☆18May 1, 2025Updated last year
shreyansh26 / Attention-Mask-Patterns
View on GitHub
Using FlexAttention to compute attention with different masking patterns
☆47Sep 22, 2024Updated last year
JacobPfau / fillerTokens
View on GitHub
☆76Apr 27, 2024Updated 2 years ago
iLearn-Lab / ACL25-PTQ1.61
View on GitHub
☆15Apr 6, 2026Updated 3 months ago
LHRLAB / KBQA-o1
View on GitHub
[ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".
☆38Dec 6, 2025Updated 7 months ago
StarDewXXX / AdaR1
View on GitHub
The official repository of NeurIPS'25 paper "Ada-R1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"
☆24May 6, 2026Updated 2 months ago
JindongGu / SimDis
View on GitHub
A pytorch implementation of the ICCV2021 workshop paper SimDis: Simple Distillation Baselines for Improving Small Self-supervised Models
☆14Jul 15, 2021Updated 5 years ago
plageon / HierSearch
View on GitHub
HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches
☆40Oct 9, 2025Updated 9 months ago
PKU-ML / LongPPL
View on GitHub
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆115Oct 11, 2025Updated 9 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ahxt / mini-r1-zero
View on GitHub
☆20Feb 2, 2025Updated last year
princeton-pli / what-makes-good-rm
View on GitHub
[NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective
☆44Sep 18, 2025Updated 10 months ago
qhjqhj00 / MetaAgent
View on GitHub
MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning
☆47Sep 3, 2025Updated 10 months ago
RUCKBReasoning / CodeRM
View on GitHub
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'
☆27May 16, 2025Updated last year
icaros-usc / dqd-rl
View on GitHub
Official implementation of "Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning"
☆22Oct 3, 2022Updated 3 years ago
TaiMingLu / know-dont-tell
View on GitHub
☆19Oct 14, 2024Updated last year
HypherX / Evolution-Analysis
View on GitHub
☆25Dec 13, 2024Updated last year
AI45Lab / DEAN
View on GitHub
☆11Oct 25, 2024Updated last year
NineAbyss / S2R
View on GitHub
This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"
☆76Apr 22, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
henryzhongsc / longctx_bench
View on GitHub
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches. EMNLP Findings 2024
☆89Feb 27, 2025Updated last year
aeroplanepaper / GRPO-LEAD
View on GitHub
☆40Nov 18, 2025Updated 8 months ago
hamishivi / EasyLM
View on GitHub
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆78Aug 17, 2024Updated last year
tanganke / subspace_fusion
View on GitHub
Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"
☆14Mar 28, 2024Updated 2 years ago
shangshang-wang / Resa
View on GitHub
Resa: Transparent Reasoning Models via SAEs
☆50Sep 23, 2025Updated 9 months ago
jaeho-lee / oce
View on GitHub
Codes for "Learning bounds for risk-sensitive learning," NeurIPS 2020 (or see arXiv 2006.08138)
☆11Oct 15, 2020Updated 5 years ago
RUCAIBox / SWE-World
View on GitHub
☆49Mar 6, 2026Updated 4 months ago
shawnricecake / squant
View on GitHub
[ICCAD 2025] Squant
☆15Jul 3, 2025Updated last year
HKUNLP / critic-rl
View on GitHub
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆126May 6, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
tanganke / pareto_set_learning
View on GitHub
Code for paper "Towards Efficient Pareto Set Approximation via Weight-Ensembling Mixture of Experts"
☆11Sep 13, 2024Updated last year
lukahhcm / Awesome_Environment_Scaling
View on GitHub
Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …
☆71Jan 28, 2026Updated 5 months ago
jeon185 / LaViC
View on GitHub
Implementation of LaViC (KDD 2025)
☆13Jun 1, 2025Updated last year
ZrW00 / MuScleLoRA
View on GitHub
The code implementation of MuScleLoRA (Accepted in ACL 2024)
☆10Dec 1, 2024Updated last year
usail-hkust / benchmark_inference_time_computation_LLM
View on GitHub
[NeurIPS 2025] Bag of Tricks for Inference-time Computation of LLM Reasoning
☆16Sep 20, 2025Updated 10 months ago
THU-BPM / Watermark-Radioactivity-Attack
View on GitHub
[ACL 2025 Main] Code and data for paper "Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?"
☆23Jun 18, 2025Updated last year
ahxt / G2R
View on GitHub
[WWW2022] Geometric Graph Representation Learning via Maximizing Rate Reduction
☆26May 27, 2022Updated 4 years ago