Lightweight tool to identify Data Contamination in LLMs evaluation
☆53Mar 8, 2024Updated last year
Alternatives and similar repositories for Contamination_Detector
Users that are interested in Contamination_Detector are comparing it to the libraries listed below
Sorting:
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆51Oct 31, 2024Updated last year
- This the implementation of LeCo☆31Jan 20, 2025Updated last year
- [EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆20Oct 2, 2024Updated last year
- LLM benchmarks☆13Feb 22, 2024Updated 2 years ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- Xlore2.0 Code[BaiduExtractor, HudongExtractor, WikiExtractor, XloreData, XloreWeb]☆12Apr 5, 2017Updated 8 years ago
- 📊 A simple command-line utility for querying and monitoring GPU status☆14Aug 3, 2023Updated 2 years ago
- ☆10Feb 6, 2025Updated last year
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆31Dec 6, 2023Updated 2 years ago
- ☆17Jul 12, 2025Updated 7 months ago
- ☆14Aug 15, 2024Updated last year
- ☆19Sep 16, 2025Updated 5 months ago
- ☆13Jun 26, 2024Updated last year
- Code repo for FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs.☆32Nov 5, 2025Updated 4 months ago
- This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.☆15Feb 12, 2024Updated 2 years ago
- ☆17Feb 20, 2026Updated last week
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- [EMNLP 2022] Code for our paper “ZeroGen: Efficient Zero-shot Learning via Dataset Generation”.☆16Feb 18, 2022Updated 4 years ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆72May 22, 2025Updated 9 months ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆22Nov 8, 2023Updated 2 years ago
- NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks☆20May 10, 2022Updated 3 years ago
- ☆20Dec 22, 2023Updated 2 years ago
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆22Nov 26, 2022Updated 3 years ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Oct 4, 2022Updated 3 years ago
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆18Dec 22, 2023Updated 2 years ago
- ☆25Nov 19, 2025Updated 3 months ago
- ☆51Mar 2, 2024Updated 2 years ago
- ☆19Mar 6, 2023Updated 2 years ago
- Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confid…☆23May 8, 2023Updated 2 years ago
- Wikipedia based dataset to train relationship classifiers and fact extraction models☆26May 25, 2021Updated 4 years ago
- ☆27Jul 11, 2024Updated last year
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆24Nov 25, 2024Updated last year
- ☆25Aug 23, 2024Updated last year
- 🤖ConvRe🤯: An Investigation of LLMs’ Inefficacy in Understanding Converse Relations (EMNLP 2023)☆24Oct 10, 2023Updated 2 years ago
- ☆27Jul 20, 2024Updated last year
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated 2 years ago
- Codebase for decoding compressed trust.☆25May 7, 2024Updated last year
- Code Release for "Broken Neural Scaling Laws" (BNSL) paper☆59Oct 29, 2023Updated 2 years ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆226Nov 16, 2024Updated last year