multimodal-art-projection / COIG-PLinks
☆36Updated last month
Alternatives and similar repositories for COIG-P
Users that are interested in COIG-P are comparing it to the libraries listed below
Sorting:
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆50Updated 5 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆73Updated this week
- [ACL 2025, Main Conference] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆28Updated 10 months ago
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆63Updated 7 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆32Updated 5 months ago
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆61Updated this week
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆77Updated last year
- RM-R1: Unleashing the Reasoning Potential of Reward Models☆97Updated this week
- The code and data for the paper JiuZhang3.0☆45Updated last year
- ☆49Updated 3 weeks ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆37Updated 3 months ago
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆57Updated 3 weeks ago
- A light-weight tool for evaluating LLMs in rule-based ways.☆54Updated last week
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆48Updated 11 months ago
- This repository collects research papers on learning from rewards in the context of post-training and test-time scaling of large language…☆37Updated 3 weeks ago
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆39Updated 10 months ago
- ☆89Updated last week
- [arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents☆30Updated 2 weeks ago
- The official repository of the Omni-MATH benchmark.☆83Updated 5 months ago
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆23Updated 9 months ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆45Updated 6 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆47Updated 5 months ago
- ☆36Updated 9 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆54Updated 2 weeks ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated last year
- The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆102Updated this week
- ☆42Updated 3 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆106Updated last month
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆138Updated 3 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated last month