OS-Copilot / ScienceBoardLinks
Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"
☆71Updated last week
Alternatives and similar repositories for ScienceBoard
Users that are interested in ScienceBoard are comparing it to the libraries listed below
Sorting:
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆59Updated 6 months ago
- ☆10Updated 7 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆99Updated last month
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 7 months ago
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆29Updated 7 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆59Updated 8 months ago
- ☆53Updated last week
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆65Updated 2 months ago
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆48Updated last month
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆108Updated 6 months ago
- Structured Chemistry Reasoning with Large Language Models☆38Updated last year
- The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)☆31Updated last year
- Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?☆28Updated 3 weeks ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆25Updated last month
- [ICLR 2025] This is the code repo for our ICLR’25 paper "RAG-DDR: Optimizing Retrieval-Augmented Generation Using Differentiable Data Rew…☆40Updated 4 months ago
- [EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"☆125Updated 9 months ago
- The OlymMATH dataset☆16Updated 3 weeks ago
- [ACL'25 Main] Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs☆23Updated last month
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆136Updated last week
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆25Updated last week
- This is the code of MMOA-RAG.☆53Updated last month
- ☆66Updated 3 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆113Updated 2 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 4 months ago
- ☆62Updated last week
- [ICLR'25] Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆76Updated 7 months ago
- Revisiting Mid-training in the Era of RL Scaling☆62Updated 2 months ago
- A Sober Look at Language Model Reasoning☆74Updated last week
- The official repository of paper "AdaR1: From Long-Cot to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization"☆15Updated last month
- The official repository of the Omni-MATH benchmark.☆84Updated 6 months ago