This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
☆27Apr 21, 2023Updated 2 years ago
Alternatives and similar repositories for decontamination
Users that are interested in decontamination are comparing it to the libraries listed below
Sorting:
- ☆22Aug 27, 2023Updated 2 years ago
- An implementation of "Subspace Representations for Soft Set Operations and Sentence Similarities" (NAACL 2024)☆10May 31, 2024Updated last year
- To be readable without enhancing english power.☆10Jul 22, 2020Updated 5 years ago
- An experiment to see if chatgpt can improve the output of the stanford alpaca dataset☆12Mar 29, 2023Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Aug 25, 2023Updated 2 years ago
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆87Sep 17, 2024Updated last year
- A library for squeakily cleaning and filtering language datasets.☆50Jul 10, 2023Updated 2 years ago
- QLoRA with Enhanced Multi GPU Support☆38Aug 8, 2023Updated 2 years ago
- Scraping Leetcode using selenium to get upvotes and downvotes in each question☆14Jan 15, 2020Updated 6 years ago
- Use sync mode Playwright interactively, inside a Jupyter notebook☆19Jan 29, 2026Updated last month
- ☆22Oct 12, 2021Updated 4 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆48Aug 2, 2021Updated 4 years ago
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆35May 13, 2022Updated 3 years ago
- Full finetuning of large language models without large memory requirements☆94Sep 22, 2025Updated 5 months ago
- Checkpointable dataset utilities for foundation model training☆32Jan 29, 2024Updated 2 years ago
- evol augment any dataset online☆61Aug 3, 2023Updated 2 years ago
- data cleaning and curation for unstructured text☆329Aug 6, 2024Updated last year
- The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as…☆18Sep 17, 2025Updated 5 months ago
- ☆11Jun 22, 2016Updated 9 years ago
- ☆13Oct 5, 2025Updated 4 months ago
- Support Continual pre-training & Instruction Tuning forked from llama-recipes☆34Feb 17, 2024Updated 2 years ago
- Gibsonify — Collect nutritional data using Gibson's method!☆11Oct 28, 2023Updated 2 years ago
- Generate textbook-quality synthetic LLM pretraining data☆509Oct 19, 2023Updated 2 years ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆83Sep 10, 2023Updated 2 years ago
- VibEx (vx) is a developer-friendly CLI tool that streamlines the process of working with AI coding assistants. It helps developers prepar…☆28May 17, 2025Updated 9 months ago
- A Chrome extension for quick and compact access to your bookmarks.☆10Jun 3, 2017Updated 8 years ago
- A Java Entity-Component-System game engine.☆11Dec 24, 2019Updated 6 years ago
- 🪝PISCES - Precise In-Parameter Suppression for Concept EraSure in Large Language Models☆12May 30, 2025Updated 9 months ago
- A python implementation of the sinaplot using matplotlib and seaborn☆11Jun 5, 2018Updated 7 years ago
- ☆42Apr 30, 2024Updated last year
- A framework for few-shot evaluation of autoregressive language models.☆153Sep 13, 2024Updated last year
- Big Data and Machine Intelligence, Spring 2021.☆12Jul 2, 2021Updated 4 years ago
- mySight is myspectral.com Spectruino analyzer for light spectra in UV/VIS/NIR☆19Dec 26, 2013Updated 12 years ago
- Basic Android data logger application☆12Jul 24, 2014Updated 11 years ago
- a free and secure peer to peer meeting application☆19Sep 17, 2022Updated 3 years ago
- ☆12Feb 3, 2026Updated 3 weeks ago
- Backend for skillgraph - a skill based framework for building agents that work.☆28Nov 10, 2025Updated 3 months ago
- Exploring the world of Generative AI through Google’s 5-Day Intensive Course. Covering foundational LLMs, prompt engineering, embeddings,…☆18May 3, 2025Updated 9 months ago
- Vtabs provide the vertical tabs for the chrome browser.☆11Aug 12, 2024Updated last year