AQA6666 / SCP-116K-open
☆25Updated 2 months ago
Alternatives and similar repositories for SCP-116K-open:
Users that are interested in SCP-116K-open are comparing it to the libraries listed below
- ☆55Updated 6 months ago
- A framework for editing the CoTs for better factuality☆51Updated last year
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆41Updated 2 weeks ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆90Updated 3 weeks ago
- Knowledge-Reasoning Synergy Reinforcement Learning.☆35Updated 2 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆48Updated 10 months ago
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆39Updated 9 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆62Updated 6 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆65Updated 5 months ago
- ☆49Updated last year
- ☆81Updated last year
- ☆143Updated 10 months ago
- ☆31Updated 5 months ago
- This the implementation of LeCo☆31Updated 3 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆89Updated 2 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆72Updated 2 weeks ago
- The official repository of the Omni-MATH benchmark.☆83Updated 4 months ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)☆48Updated last year
- Official completion of “Training on the Benchmark Is Not All You Need”.☆31Updated 4 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 6 months ago
- ☆102Updated 5 months ago
- ☆115Updated 2 weeks ago
- The code of arxiv paper: "CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis"☆24Updated 4 months ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆103Updated last month
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆42Updated 10 months ago
- ☆45Updated 3 weeks ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆119Updated 6 months ago
- Code and data for QueryAgent(ACL 2024)☆21Updated 4 months ago
- ☆97Updated last year
- ☆153Updated last month