APIBench is a benchmark for evaluating the performance of API recommendation approaches released in the paper "Revisiting, Benchmarking and Exploring APIRecommendation: How Far Are We?".
☆66Apr 3, 2023Updated 2 years ago
Alternatives and similar repositories for APIBench
Users that are interested in APIBench are comparing it to the libraries listed below
Sorting:
- FOCUS is a context-aware collaborative-filtering system that exploits cross relationships among OSS projects to suggest the inclusion of …☆21Jun 14, 2023Updated 2 years ago
- ☆12Oct 29, 2022Updated 3 years ago
- This is the tool released in ICSE 2024 paper "Domain Knowledge Matters: Improving Prompts with Fix Templates for Repairing Python Type Er…☆17Jun 5, 2023Updated 2 years ago
- ☆10Apr 15, 2023Updated 2 years ago
- ☆14Mar 13, 2021Updated 4 years ago
- ☆17Dec 9, 2022Updated 3 years ago
- Code and data for the paper: Apathetic or Empathetic? Evaluating LLMs' Emotional Alignments with Humans☆119Jan 25, 2026Updated last month
- ☆17Feb 22, 2024Updated 2 years ago
- Running inference on the ZeroSCROLLS benchmark☆20Apr 18, 2024Updated last year
- ☆18Apr 15, 2024Updated last year
- ☆19Jun 13, 2024Updated last year
- ☆19Dec 8, 2022Updated 3 years ago
- ☆20Mar 6, 2023Updated 2 years ago
- A dataset of reproducible breaking dependency updates, SANER 2024 (https://doi.org/10.1109/SANER60148.2024.00024)☆21Feb 20, 2026Updated last week
- Code and data for the paper: Competing Large Language Models in Multi-Agent Gaming Environments☆95Jan 26, 2026Updated last month
- ☆56Aug 10, 2024Updated last year
- [ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use☆110Mar 21, 2024Updated last year
- ☆20Oct 25, 2023Updated 2 years ago
- Chinese Vision-Language Understanding Evaluation☆23Dec 26, 2024Updated last year
- ☆21May 5, 2020Updated 5 years ago
- ☆37Jan 25, 2024Updated 2 years ago
- This is the code repository for our ICPC 2021 paper "Improving Code Summarization with Block-wise Abstract Syntax Tree Splitting"☆24Jan 3, 2023Updated 3 years ago
- A small and fast image rescaling library with SIMD support☆22Aug 11, 2025Updated 6 months ago
- Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?☆57Apr 17, 2023Updated 2 years ago
- The first Object-Oriented Programming (OOP) Evaluation Benchmark for LLMs☆27Jan 15, 2025Updated last year
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆71Jan 15, 2026Updated last month
- [EMNLP2024] Benchmark for "Large Language Models Are Poor Clinical Decision-Makers: A Comprehensive Benchmark"☆36Sep 18, 2025Updated 5 months ago
- Dump the call graph by the static analysis of FlowDroid☆23Jun 22, 2017Updated 8 years ago
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models☆119Jun 12, 2025Updated 8 months ago
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- Data for paper "Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness"☆33May 3, 2023Updated 2 years ago
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆286Aug 19, 2023Updated 2 years ago
- A curated list of software engineering research, data set, tool.☆33Dec 16, 2022Updated 3 years ago
- A dataset for training and evaluating LLMs on decision making about "when (not) to call" functions☆55Apr 29, 2025Updated 10 months ago
- ☆30Nov 23, 2020Updated 5 years ago
- code for "Natural Language to Code Translation with Execution"☆41Nov 2, 2022Updated 3 years ago
- Functional clone detection(currently maintained version)☆34Sep 30, 2022Updated 3 years ago
- 中文大语言模型评测第三期☆35Dec 30, 2025Updated 2 months ago
- A library for building intraprocedural PDGs for Java programs☆36Sep 28, 2023Updated 2 years ago