A list of papers about data quality in Large Language Models (LLMs)
☆27Dec 14, 2023Updated 2 years ago
Alternatives and similar repositories for Data-Centric_LLM_Studies
Users that are interested in Data-Centric_LLM_Studies are comparing it to the libraries listed below
Sorting:
- Official Code Repository for [AutoScale📈: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*…☆13Aug 8, 2025Updated 7 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- ☆23Aug 7, 2023Updated 2 years ago
- brain to speech☆10Nov 7, 2025Updated 4 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆37May 31, 2025Updated 9 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated last year
- [EMNLP 2025 main] C3 Benchmark: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations☆30Dec 24, 2025Updated 2 months ago
- ☆54May 19, 2025Updated 9 months ago
- 深度学习的基础课程☆14May 4, 2018Updated 7 years ago
- Python library to compute functional connectivity measures from EEG☆12Oct 14, 2023Updated 2 years ago
- Python script to obtain dynamic functional connectivity metrics, after using a sliding window approach, statistical analyses to test for …☆12Sep 10, 2024Updated last year
- ☆11Aug 20, 2025Updated 6 months ago
- ☆14Jan 24, 2025Updated last year
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- Official PyTorch implementation of "Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning i…☆31Jan 9, 2026Updated 2 months ago
- ☆13May 21, 2023Updated 2 years ago
- We introduce XBrainLab, an open-source user-friendly software, for accelerated interpretation of neural patterns from EEG data based on c…☆13Dec 5, 2025Updated 3 months ago
- UnitEval is a benchmarking and evaluation tools for AutoDev Coder.☆13Jan 2, 2024Updated 2 years ago
- ICML 2024 - Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning☆10Jul 16, 2024Updated last year
- Dataflow-MM, multi-media operators for Dataflow. We aim to prepare data for Multimodal Large Language Models.☆31Feb 25, 2026Updated last week
- 2018云移杯景区口碑评价分值预测 7/1186☆11Jul 16, 2018Updated 7 years ago
- Mental image reconstruction from human brain activity☆14Jul 1, 2024Updated last year
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated 11 months ago
- ☆13Aug 11, 2024Updated last year
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 7 months ago
- [ICLR 2025] Released code for paper "Spurious Forgetting in Continual Learning of Language Models"☆59May 9, 2025Updated 10 months ago
- ☆11May 18, 2025Updated 9 months ago
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆18Jun 19, 2025Updated 8 months ago
- ☆20Aug 14, 2025Updated 6 months ago
- Mimic interview☆10Jun 26, 2020Updated 5 years ago
- An Open Source implementation of Notebook LM.☆37Feb 27, 2026Updated last week
- LCA-on-the-line (ICML 2024 Oral)☆13Feb 13, 2025Updated last year
- Code for LLM_Catastrophic_Forgetting via SAM.☆11Jun 7, 2024Updated last year
- MedARC fMRI foundation model☆30Jan 15, 2026Updated last month
- A Translation Task using TurboTransformers☆11Dec 17, 2020Updated 5 years ago
- Implements High-Gamma dataset decoding using Filter Bank Common Spatial Pattern with rLDA classification and Neural Networks.☆11Mar 14, 2019Updated 6 years ago
- BERT系列模型、搜搜、剪枝、蒸馏☆13Sep 10, 2020Updated 5 years ago
- ☆13Jan 22, 2025Updated last year
- This is the code repo for our paper "Say More with Less: Understanding Prompt Learning Behaviors through Gist Compression".☆12Feb 27, 2024Updated 2 years ago