A dataset for training and evaluating LLMs on decision making about "when (not) to call" functions
☆57Apr 29, 2025Updated 10 months ago
Alternatives and similar repositories for When2Call
Users that are interested in When2Call are comparing it to the libraries listed below
Sorting:
- IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent☆69May 13, 2025Updated 10 months ago
- ☆34May 24, 2025Updated 9 months ago
- [EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"☆29Jun 3, 2025Updated 9 months ago
- The first large scale formally verified reasoning dataset for Verilog☆21May 16, 2025Updated 10 months ago
- FamilyTool benchmark☆13Sep 10, 2025Updated 6 months ago
- ☆25May 28, 2025Updated 9 months ago
- This is a repo consisting of papers about LLMs' perception of their knowledge boundaries; Uncertainty Quantification; Honesty Alignment; …☆24Nov 25, 2025Updated 3 months ago
- A curated list of cutting-edge research papers and resources on Long Chain-of-Thought (CoT) Reasoning with Tools.☆46Dec 17, 2025Updated 3 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- 武大信图抢座程序 支持后台持续监测,抢靠窗、有电脑的座位 以及抢座成功后自动关机☆15Dec 8, 2022Updated 3 years ago
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆328Jan 3, 2026Updated 2 months ago
- VehicleWorld is the first comprehensive multi-device environment for intelligent vehicle interaction that accurately models the complex, …☆21Sep 16, 2025Updated 6 months ago
- chat mind ai 用户基地☆11Oct 26, 2024Updated last year
- A tool for an analysis of LLM generations.☆42Oct 13, 2025Updated 5 months ago
- 部署在树莓派上实现移动物体的实时监控。(开源版本)☆18Jan 11, 2023Updated 3 years ago
- 🍎Wende Chinese QA system (experimental)☆10Jun 1, 2021Updated 4 years ago
- UnifiedToolHub is a comprehensive project supporting LLM-based tool use, designed to unify various tool-use dataset formats and provide t…☆19Jul 23, 2025Updated 7 months ago
- Generate Python docstrings automatically with LLM and syntax trees☆20Jun 13, 2025Updated 9 months ago
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆37Oct 16, 2025Updated 5 months ago
- DataSciBench: An LLM Agent Benchmark for Data Science☆54Jan 21, 2026Updated 2 months ago
- XmodelLM☆38Nov 19, 2024Updated last year
- Recursive Abstractive Processing for Tree-Organized Retrieval☆10May 30, 2024Updated last year
- bert for chinese text classification☆10Dec 11, 2018Updated 7 years ago
- A SystemVerilog Assertion dataset to improve hardware verification with LLMs.☆22Jun 9, 2025Updated 9 months ago
- 本项目是July的《程序员编程艺术》的电子书版本☆11Jan 9, 2014Updated 12 years ago
- Paper Reading Summary(mainly NLP related papers)☆11Nov 6, 2019Updated 6 years ago
- ☆304Aug 12, 2025Updated 7 months ago
- ☆11Jun 11, 2024Updated last year
- APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding☆14Jul 22, 2024Updated last year
- ☆15May 12, 2025Updated 10 months ago
- NeurIPS 2024: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation☆13May 24, 2025Updated 9 months ago
- This is the official implementation for "AUTOPR: LET'S AUTOMATE YOUR ACADEMIC PROMOTION!".☆97Oct 16, 2025Updated 5 months ago
- GlotEval: a unified evaluation toolkit designed to benchmark multilingual Large Language Models (LLMs) in a language-specific way☆18Nov 4, 2025Updated 4 months ago
- [ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs☆17May 21, 2025Updated 10 months ago
- Official repository of the paper MPMQA: Multimodal Question Answering on Product Manuals (AAAI 2023)☆19Nov 28, 2022Updated 3 years ago
- ☆22Jan 13, 2025Updated last year
- ☆13Feb 11, 2019Updated 7 years ago
- ☆29Apr 8, 2025Updated 11 months ago
- The official implemention of "Depth-Breadth Synergy in RLVR: Unlocking LLM Reasoning Gains with Adaptive Exploration"☆24Feb 4, 2026Updated last month