dada-qin / Data-Centric_LLM_StudiesView external linksLinks
A list of papers about data quality in Large Language Models (LLMs)
☆27Dec 14, 2023Updated 2 years ago
Alternatives and similar repositories for Data-Centric_LLM_Studies
Users that are interested in Data-Centric_LLM_Studies are comparing it to the libraries listed below
Sorting:
- Survey on Data-centric Large Language Models☆88Jul 8, 2024Updated last year
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆13Aug 8, 2025Updated 6 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated last year
- ☆23Aug 7, 2023Updated 2 years ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- ☆32Jun 5, 2025Updated 8 months ago
- This repository contains source code and a high-quality test dataset for "Automated Commit Message Generation with Large Language Models.…☆10Nov 6, 2025Updated 3 months ago
- ☆53May 19, 2025Updated 8 months ago
- 深度学习的基础课程☆14May 4, 2018Updated 7 years ago
- Python library to compute functional connectivity measures from EEG☆12Oct 14, 2023Updated 2 years ago
- Real-time multi-language unit test generation tool via LSP☆31Updated this week
- Python script to obtain dynamic functional connectivity metrics, after using a sliding window approach, statistical analyses to test for …☆12Sep 10, 2024Updated last year
- ☆11Aug 20, 2025Updated 5 months ago
- 2018云移杯景区口碑评价分值预测 7/1186☆11Jul 16, 2018Updated 7 years ago
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 9 months ago
- UnitEval is a benchmarking and evaluation tools for AutoDev Coder.☆13Jan 2, 2024Updated 2 years ago
- Just a demonstration of some sampling techniques (rejection sampling, importance sampling, sampling importance resampling, Metropolis sam…☆11Aug 24, 2013Updated 12 years ago
- ☆14Jan 24, 2025Updated last year
- [ICML 2025] Repository for M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Predictive Embedding Architecture☆17Nov 4, 2025Updated 3 months ago
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated 10 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 6 months ago
- Unity example based on OpenVR's Tracked Camera sample code.☆11Jul 13, 2016Updated 9 years ago
- ☆13May 21, 2023Updated 2 years ago
- Mental image reconstruction from human brain activity☆14Jul 1, 2024Updated last year
- Dataflow-MM, multi-media operators for Dataflow. We aim to prepare data for Multimodal Large Language Models.☆28Updated this week
- We introduce XBrainLab, an open-source user-friendly software, for accelerated interpretation of neural patterns from EEG data based on c…☆13Dec 5, 2025Updated 2 months ago
- ☆13Aug 11, 2024Updated last year
- [ICLR 2025] Released code for paper "Spurious Forgetting in Continual Learning of Language Models"☆59May 9, 2025Updated 9 months ago
- An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.☆13Jan 9, 2024Updated 2 years ago
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆17Jun 19, 2025Updated 7 months ago
- LCA-on-the-line (ICML 2024 Oral)☆13Feb 13, 2025Updated last year
- Offical code repository for PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation, EMNLP 2023☆12Dec 13, 2023Updated 2 years ago
- Code for "General-Purpose Brain Foundation Models for Time-Series Neuroimaging Data"☆15Dec 14, 2024Updated last year
- ☆13Jan 22, 2025Updated last year
- A Translation Task using TurboTransformers☆11Dec 17, 2020Updated 5 years ago
- ☆11May 18, 2025Updated 8 months ago
- MedARC fMRI foundation model☆30Jan 15, 2026Updated last month
- Code for LLM_Catastrophic_Forgetting via SAM.☆11Jun 7, 2024Updated last year
- 冰眼冷链项目☆14Feb 1, 2021Updated 5 years ago