☆17Aug 1, 2025Updated 6 months ago
Alternatives and similar repositories for ThinkPO
Users that are interested in ThinkPO are comparing it to the libraries listed below
Sorting:
- Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning☆28Jul 14, 2025Updated 7 months ago
- ☆16Feb 22, 2025Updated last year
- ☆14Mar 20, 2025Updated 11 months ago
- ☆33Nov 18, 2025Updated 3 months ago
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows☆19Nov 4, 2025Updated 3 months ago
- ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation☆25Aug 24, 2025Updated 6 months ago
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆29Dec 24, 2025Updated 2 months ago
- [ICML'25] "Rethinking Addressing in Language Models via Contextualized Equivariant Positional Encoding" by Jiajun Zhu, Peihao Wang, Ruisi…☆14Jun 6, 2025Updated 8 months ago
- Efficient Scaling laws and collaborative pretraining.☆21Sep 18, 2025Updated 5 months ago
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality☆40Dec 1, 2025Updated 2 months ago
- ☆19Jun 4, 2025Updated 8 months ago
- ☆21Jul 21, 2025Updated 7 months ago
- [ICML 2025] Official resources of "KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search".☆35Dec 6, 2025Updated 2 months ago
- ☆35May 16, 2025Updated 9 months ago
- CS194-196 Course Project☆14Feb 20, 2025Updated last year
- ☆18Oct 14, 2024Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆110Oct 11, 2025Updated 4 months ago
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆42Sep 18, 2025Updated 5 months ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆18Oct 17, 2025Updated 4 months ago
- Quantize What Counts: Bit Allocation Insights Informed by Spectral Gaps in Keys and Values☆20Nov 7, 2025Updated 3 months ago
- A comprehensive and efficient long-context model evaluation framework☆31Feb 8, 2026Updated 3 weeks ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- [ACL 2025 Findings] Text2World: Benchmarking Large Language Models for Symbolic World Model Generation☆28Feb 25, 2025Updated last year
- ☆18Oct 28, 2025Updated 4 months ago
- The official implementation of Preference Data Reward-Augmentation.☆18May 1, 2025Updated 9 months ago
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Mar 31, 2025Updated 11 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆27Mar 1, 2025Updated 11 months ago
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- ☆74Apr 27, 2024Updated last year
- Code and data for paper "Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?". (ACL 2025 Main)☆21Jun 18, 2025Updated 8 months ago
- [EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality☆19Mar 4, 2025Updated 11 months ago
- ☆20Feb 2, 2025Updated last year
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches☆37Oct 9, 2025Updated 4 months ago
- ☆25Jun 10, 2025Updated 8 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆74Apr 22, 2025Updated 10 months ago
- ☆21May 3, 2025Updated 9 months ago
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"☆33Jul 25, 2025Updated 7 months ago
- Automating Sub-Agent Creation for Agentic Orchestration☆58Feb 11, 2026Updated 2 weeks ago
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆21Feb 16, 2025Updated last year