cs-holder / Reasoning-Self-Evolution-Survey
☆40Updated last month
Alternatives and similar repositories for Reasoning-Self-Evolution-Survey:
Users that are interested in Reasoning-Self-Evolution-Survey are comparing it to the libraries listed below
- ☆55Updated 6 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆64Updated last week
- [preprint] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆43Updated 3 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆47Updated 3 months ago
- ☆125Updated 3 weeks ago
- ☆22Updated 9 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆58Updated 4 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆79Updated 2 months ago
- This the implementation of LeCo☆32Updated 3 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆20Updated 2 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- A comprehensive collection of process reward models.☆67Updated this week
- Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models?☆23Updated last month
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- The code of arxiv paper: "CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis"☆24Updated 3 months ago
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆25Updated 10 months ago
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆36Updated last week
- This is a unified platform for implementing and evaluating test-time reasoning mechanisms in Large Language Models (LLMs).☆15Updated 3 months ago
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆39Updated 9 months ago
- Knowledge-Reasoning Synergy Reinforcement Learning.☆34Updated last month
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆49Updated 5 months ago
- ☆44Updated 5 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆57Updated last year
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆94Updated last week
- a-m-team's exploration in large language modeling☆49Updated 3 weeks ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆26Updated 4 months ago
- The code and data for the paper JiuZhang3.0☆43Updated 10 months ago
- ☆101Updated 4 months ago
- A research repo for experiments about Reinforcement Finetuning☆44Updated 2 weeks ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆58Updated last year