RUCKBReasoning / SpreadsheetBench
SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation
☆19Updated 2 weeks ago
Alternatives and similar repositories for SpreadsheetBench:
Users that are interested in SpreadsheetBench are comparing it to the libraries listed below
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆50Updated 2 months ago
- ☆47Updated 4 months ago
- Open source code of the paper: "OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain"☆55Updated 4 months ago
- RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation.☆126Updated last week
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆99Updated last week
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 6 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆79Updated 2 months ago
- ☆118Updated 10 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆135Updated 5 months ago
- ☆22Updated 3 months ago
- PGRAG☆48Updated 9 months ago
- Reformatted Alignment☆115Updated 7 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆65Updated 4 months ago
- Code for Search-in-the-Chain: Towards Accurate, Credible and Traceable Large Language Models for Knowledge-intensive Tasks☆55Updated last year
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"☆69Updated 8 months ago
- [ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use☆86Updated last year
- This is the code of MMOA-RAG.☆50Updated last month
- [Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token☆136Updated 9 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆123Updated 5 months ago
- ☆44Updated 5 months ago
- The code of arxiv paper: "CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis"☆24Updated 3 months ago
- This is the code repo for our paper "Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents".☆106Updated 6 months ago
- Code for paper Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding☆63Updated 10 months ago
- NaturalCodeBench (Findings of ACL 2024)☆63Updated 6 months ago
- AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…☆101Updated last month
- [COLM'24] Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration☆25Updated 6 months ago
- [ACL 2023] This is the code repo for our ACL'23 paper "Augmentation-Adapted Retriever Improves Generalization of Language Models as Gener…☆60Updated 9 months ago
- Code for the 2024 arXiv publication "Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Mo…☆24Updated 9 months ago