IAAR-Shanghai / SEAP
☆20Updated this week
Alternatives and similar repositories for SEAP
Users that are interested in SEAP are comparing it to the libraries listed below
Sorting:
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆96Updated last month
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆67Updated 3 months ago
- The demo, code and data of FollowRAG☆72Updated 3 weeks ago
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆50Updated 10 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆34Updated 10 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆44Updated this week
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆62Updated 3 weeks ago
- ☆76Updated last week
- [arxiv: 2505.02156] Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents☆16Updated last week
- [Preprint] A Generalizable and Purely Unsupervised Self-Training Framework☆57Updated last week
- ☆45Updated 6 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆22Updated 3 months ago
- Code for Heima☆42Updated 3 weeks ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆62Updated 6 months ago
- Open-Pandora: On-the-fly Control Video Generation☆34Updated 5 months ago
- ☆22Updated 10 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆44Updated 5 months ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆141Updated 2 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆69Updated last month
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆72Updated 3 weeks ago
- This repository collects research papers on learning from rewards in the context of post-training and test-time scaling of large language…☆27Updated last week
- ☆17Updated 4 months ago
- ☆98Updated 2 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆35Updated 3 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆53Updated 5 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆54Updated last week
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 5 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆58Updated last year
- ☆26Updated last month