nishiwen1214/Benchmark-leakage-detection

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nishiwen1214/Benchmark-leakage-detection)

nishiwen1214 / Benchmark-leakage-detection

Official completion of “Training on the Benchmark Is Not All You Need”.

☆40

Alternatives and similar repositories for Benchmark-leakage-detection

Users that are interested in Benchmark-leakage-detection are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GAIR-NLP / benbench
View on GitHub
Benchmarking Benchmark Leakage in Large Language Models
☆61May 20, 2024Updated 2 years ago
bloomberg / MixCE-acl2023
View on GitHub
Implementation of MixCE method described in ACL 2023 paper by Zhang et al.
☆20May 29, 2023Updated 3 years ago
LivingFutureLab / ChineseSimpleQA
View on GitHub
☆79Jan 24, 2025Updated last year
ConCopilot / concopilot
View on GitHub
Making AI & LLM APPs components reusable, replaceable, portable, and flexible.
☆23Apr 28, 2024Updated 2 years ago
II-Bench / II-Bench
View on GitHub
☆28Oct 28, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
sugarandgugu / GaVaMoE
View on GitHub
code for GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation
☆18Dec 7, 2024Updated last year
lamps-lab / Patent-figure-segmentor
View on GitHub
☆13Aug 12, 2022Updated 3 years ago
OFA-Sys / gsm8k-ScRel
View on GitHub
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆269Sep 12, 2024Updated last year
RainBowLuoCS / DEEM
View on GitHub
(ICLR 2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.
☆51Jul 1, 2025Updated last year
personqianduixue / comap_crawler
View on GitHub
美赛爬虫，美国大学生数学建模竞赛证书爬取及信息OCR识别分析
☆16Jun 25, 2022Updated 4 years ago
NEUIR / MemGraph
View on GitHub
[SIGIR '25] This is the code repo for our SIGIR '25 paper: Enhancing the Patent Matching Capability of Large Language Models via Memory G…
☆19Apr 22, 2025Updated last year
taehokim20 / LLMem
View on GitHub
LLMem: GPU Memory Estimation for Fine-Tuning Pre-Trained LLMs
☆30May 31, 2025Updated last year
flageval-baai / CMMU
View on GitHub
[IJCAI 2024] CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning
☆26Feb 1, 2024Updated 2 years ago
JerryYin777 / Jerry_CV
View on GitHub
☆13Jan 21, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
ZJU-REAL / HBPO
View on GitHub
☆32Aug 11, 2025Updated 10 months ago
AI-EDU-LAB / E-EVAL
View on GitHub
Official github repo for E-Eval, a Chinese K12 education evaluation benchmark for LLMs.
☆32Feb 19, 2024Updated 2 years ago
AI4Patents / IMPACT
View on GitHub
IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents (NeurIPS 2024)
☆18Jul 14, 2025Updated 11 months ago
aryopg / mmlu-redux
View on GitHub
☆31Nov 9, 2024Updated last year
ulab-uiuc / GraphEval
View on GitHub
[ICLR 2025] "GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation", Tao Feng, Yihang Sun, Jiaxuan You
☆17Mar 18, 2025Updated last year
infi-coder / infibench-evaluation-harness
View on GitHub
The Infibench variant of bigcode-evaluation-harness --- a framework for the evaluation of autoregressive code generation language models.
☆14Oct 19, 2024Updated last year
ZJU-REAL / KnowU-Bench
View on GitHub
Official code for "KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation"
☆72Jun 13, 2026Updated 3 weeks ago
L4Clippers / Patent-Image-Retrieval-Transformer-DML
View on GitHub
☆12Jul 21, 2025Updated 11 months ago
SunbowLiu / PTvsBT
View on GitHub
On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))
☆13Nov 21, 2021Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
liseami / liseami.github.io
View on GitHub
赵纯想个人网站
☆11Nov 3, 2024Updated last year
gpengzhi / CrossConST-MT
View on GitHub
Code for Findings of ACL 2023 paper "Improving Zero-shot Multilingual Neural Machine Translation by Leveraging Cross-lingual Consistency …
☆10Jul 18, 2023Updated 2 years ago
RainBowLuoCS / MMEvol
View on GitHub
(ACL 2025) 🔥🔥🔥Code for "Empowering Multimodal Large Language Models with Evol-Instruct"
☆21May 15, 2025Updated last year
nex-agi / Nex-N1
View on GitHub
☆116Dec 5, 2025Updated 7 months ago
houhongyi / RM-DATASET
View on GitHub
☆12Aug 3, 2020Updated 5 years ago
ZJU-REAL / GUI-RCPO
View on GitHub
[AAAI 2026] Test-Time Reinforcement Learning for GUI Grounding via Region Consistency https://arxiv.org/abs/2508.05615
☆67Nov 8, 2025Updated 8 months ago
yule-BUAA / MergeLLM
View on GitHub
Codes for Merging Large Language Models
☆37Aug 7, 2024Updated last year
wizard-III / Archer2.0
View on GitHub
Archer2.0 evolves from its predecessor by introducing ASPO, which overcomes fundamental PPO-Clip limitations to prevent premature converg…
☆31Oct 10, 2025Updated 8 months ago
lemon-prog123 / LongRePS
View on GitHub
Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision
☆19Apr 1, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
rbawden / mt-bigscience
View on GitHub
Evaluation results for Machine Translation within the BigScience project
☆11May 15, 2023Updated 3 years ago
ictnlp / FA-DAT
View on GitHub
Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"
☆14Mar 1, 2023Updated 3 years ago
jinlanfu / Polyglot_Prompt
View on GitHub
Code and dataset for Polyglot Prompting: Multilingual Multitask Prompt Training.
☆18Dec 7, 2022Updated 3 years ago
tatsu-lab / test_set_contamination
View on GitHub
☆42Nov 7, 2023Updated 2 years ago
TIGER-AI-Lab / TheoremQA
View on GitHub
The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)
☆39May 15, 2024Updated 2 years ago
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
LaVi-Lab / LongContextReasoner
View on GitHub
[ACL 2024] Making Long-Context Language Models Better Multi-Hop Reasoners
☆20May 28, 2024Updated 2 years ago