Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
☆153Jan 19, 2026Updated last month
Alternatives and similar repositories for SGI-Bench
Users that are interested in SGI-Bench are comparing it to the libraries listed below
Sorting:
- [ICLR-2026] Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".☆31Updated this week
- OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System.☆19Oct 14, 2024Updated last year
- LLMRouterBench: A Massive Benchmark and Unified Framework for LLM Routing☆38Jan 30, 2026Updated last month
- This repository contains the code and data for the paper "Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents wit…☆55Feb 7, 2026Updated 3 weeks ago
- The official implementation of COOPER: A Unified Model for Cooperative Perception and Reasoning in Spatial Intelligence.☆28Dec 30, 2025Updated 2 months ago
- Aerial Detection Toolbox☆11Jan 18, 2023Updated 3 years ago
- ☆15May 26, 2025Updated 9 months ago
- ☆15Jan 12, 2026Updated last month
- [AAAI 2026] AutoTool: Efficient Tool Selection for Large Language Model Agents☆28Dec 28, 2025Updated 2 months ago
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search☆17Jan 24, 2026Updated last month
- ☆11Oct 24, 2024Updated last year
- ☆34Aug 18, 2025Updated 6 months ago
- ☆45Nov 9, 2025Updated 3 months ago
- Official implementation for our paper: Rethinking Video Tokenization: A Conditioned Diffusion-based Approach☆14Apr 2, 2025Updated 10 months ago
- LMM for VQA, tcsvt version☆11Jul 19, 2024Updated last year
- [AAAI 2026] ReCode: Reinforced Code Knowledge Editing for API Updates☆22Jul 1, 2025Updated 8 months ago
- graphs from Draw.io☆13Sep 26, 2024Updated last year
- [ICLR 2025] Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization☆12Jan 26, 2025Updated last year
- Aligning Agentic World Models via Knowledgeable Experience Learning☆31Jan 25, 2026Updated last month
- ☆34Jan 25, 2026Updated last month
- Stable-DiffCoder is a family of lightweight open-source code DLLMs(diffusion large language models) comprising base and instruct models, …☆72Jan 23, 2026Updated last month
- T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation (ICCV'25)☆42Oct 6, 2025Updated 4 months ago
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Jul 24, 2024Updated last year
- More reliable Video Understanding Evaluation☆14Sep 23, 2025Updated 5 months ago
- ☆44Feb 12, 2026Updated 2 weeks ago
- Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs☆26Nov 7, 2025Updated 3 months ago
- A lightweight, reproducible toolkit for LLM-based query reformulation.☆29Jan 3, 2026Updated last month
- LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding☆34Jan 16, 2026Updated last month
- Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs☆45Jun 17, 2025Updated 8 months ago
- [ICLR'26] OF-Diff: Object Fidelity Diffusion for Remote Sensing Image Generation☆22Feb 6, 2026Updated 3 weeks ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆32Aug 5, 2025Updated 6 months ago
- MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs☆38Feb 19, 2026Updated last week
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models☆39Updated this week
- The first open-domain closed-loop revisited benchmark for evaluating memory consistency and action control in world models.☆41Feb 10, 2026Updated 2 weeks ago
- Dr. MAS is an end-to-end RL training framework for multi-agent LLM systems, supporting the co-training of multiple (heterogeneous) LLMs.☆89Feb 11, 2026Updated 2 weeks ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆18Oct 1, 2024Updated last year
- ☆72Jan 29, 2026Updated last month
- [EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.☆27Nov 18, 2025Updated 3 months ago