swj0419 / detect-pretrain-code
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu , Terra Blevins , Danqi Chen , Luke Zettlemoyer.
☆208Updated last year
Related projects ⓘ
Alternatives and complementary repositories for detect-pretrain-code
- A Survey on Data Selection for Language Models☆182Updated last month
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆91Updated 4 months ago
- The Paper List on Data Contamination for Large Language Models Evaluation.☆75Updated this week
- All available datasets for Instruction Tuning of Large Language Models☆237Updated 11 months ago
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆145Updated 5 months ago
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆144Updated 8 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆103Updated 6 months ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆374Updated last month
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated 2 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆219Updated 2 months ago
- Self-Alignment with Principle-Following Reward Models☆148Updated 8 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆132Updated 4 months ago
- DSIR large-scale data selection framework for language model training☆230Updated 7 months ago
- BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).☆111Updated last year
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆213Updated last year
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆83Updated 4 months ago
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆428Updated 6 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 3 weeks ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆104Updated 5 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆125Updated 2 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆195Updated 2 weeks ago
- Generative Judge for Evaluating Alignment☆217Updated 10 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆67Updated 5 months ago
- [EMNLP 2023] Adapting Language Models to Compress Long Contexts☆277Updated 2 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆144Updated 5 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆106Updated last month
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- Reformatted Alignment☆112Updated last month
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA☆92Updated last week
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆117Updated 4 months ago