princeton-nlp / QuRating
[ICML 2024] Selecting High-Quality Data for Training Language Models
โ141Updated 4 months ago
Related projects โ
Alternatives and complementary repositories for QuRating
- ๐งฌ RegMix: Data Mixture as Regression for Language Model Pre-trainingโ87Updated last month
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Modelsโ72Updated 7 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodingsโ142Updated 4 months ago
- โ89Updated last month
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningโ123Updated 2 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialoguesโ46Updated 3 months ago
- [EMNLP 2024 (Oral)] Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QAโ89Updated last month
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Modelsโ47Updated last month
- Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learningโ152Updated 9 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don'tโฆโ84Updated 3 months ago
- Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.โ125Updated last year
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Modelsโ166Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"โ111Updated last week
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Modelsโ53Updated 3 months ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?โ70Updated 9 months ago
- Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuningโ32Updated 9 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ94Updated 6 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"โ59Updated 6 months ago
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"โ46Updated last year
- โ15Updated last month
- โ59Updated last year
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"โ91Updated 4 months ago
- [ACL 2023] This is the code repo for our ACL'23 paper "Augmentation-Adapted Retriever Improves Generalization of Language Models as Generโฆโ58Updated 3 months ago
- [SIGIR'24] The official implementation code of MOELoRA.โ123Updated 3 months ago
- A Survey on Data Selection for Language Modelsโ178Updated 3 weeks ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"โ67Updated 5 months ago
- โ98Updated 5 months ago
- ๐ An unofficial implementation of Self-Alignment with Instruction Backtranslation.โ131Updated 4 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.โ43Updated last week
- โ70Updated 10 months ago