yuelinan / Awesome-Efficient-R1-style-LRMsLinks
☆24Updated last week
Alternatives and similar repositories for Awesome-Efficient-R1-style-LRMs
Users that are interested in Awesome-Efficient-R1-style-LRMs are comparing it to the libraries listed below
Sorting:
- Scaling Preference Data Curation via Human-AI Synergy☆97Updated last month
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆33Updated 2 months ago
- ☆53Updated this week
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- ☆14Updated 7 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆53Updated 2 months ago
- ☆49Updated 5 months ago
- ☆24Updated 3 months ago
- ☆38Updated last month
- ☆50Updated 5 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆40Updated 5 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆23Updated last month
- [arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agents☆40Updated last month
- ☆24Updated last week
- Extrapolating RLVR to General Domains without Verifiers☆136Updated 2 weeks ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆51Updated 2 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆90Updated last year
- ☆102Updated 2 months ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆54Updated last year
- ☆22Updated last year
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆70Updated 3 weeks ago
- [ICML2025] The official implementation of "C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Gene…☆38Updated 3 months ago
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆18Updated 10 months ago
- ☆12Updated 6 months ago
- ☆18Updated last month
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆34Updated last week
- ☆16Updated 8 months ago
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'☆26Updated 2 months ago
- Unsupervised GRPO☆41Updated 2 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 7 months ago