IAAR-Shanghai / SEAP
☆20Updated last month
Alternatives and similar repositories for SEAP:
Users that are interested in SEAP are comparing it to the libraries listed below
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆75Updated last week
- This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"☆48Updated 9 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆65Updated 2 months ago
- The demo, code and data of FollowRAG☆71Updated this week
- ☆22Updated 9 months ago
- ☆44Updated 6 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆64Updated last week
- Large Language Models Can Self-Improve in Long-context Reasoning☆68Updated 5 months ago
- [preprint] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.☆43Updated 4 months ago
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆39Updated 4 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆20Updated 2 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆61Updated 6 months ago
- ☆17Updated last week
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆33Updated 9 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆49Updated 5 months ago
- ☆22Updated 9 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 4 months ago
- M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning☆56Updated 4 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆68Updated last month
- ☆55Updated 6 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆116Updated last month
- ☆18Updated 4 months ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆133Updated last month
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆81Updated 2 months ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆42Updated 5 months ago
- A Self-Training Framework for Vision-Language Reasoning☆76Updated 3 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆35Updated 3 months ago
- The code of arxiv paper: "CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis"☆24Updated 3 months ago