yuleiqin / fantastic-data-engineeringView external linksLinks
Fantastic Data Engineering for Large Language Models
☆93Dec 29, 2024Updated last year
Alternatives and similar repositories for fantastic-data-engineering
Users that are interested in fantastic-data-engineering are comparing it to the libraries listed below
Sorting:
- ☆16Sep 4, 2025Updated 5 months ago
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆35Jun 13, 2025Updated 8 months ago
- Official repository for Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning☆12Sep 2, 2024Updated last year
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 9 months ago
- A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models☆19May 24, 2025Updated 8 months ago
- ☆31Feb 9, 2025Updated last year
- Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning☆47Jan 22, 2026Updated 3 weeks ago
- ☆16Jul 23, 2024Updated last year
- A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…☆20Nov 21, 2024Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆188Jun 25, 2025Updated 7 months ago
- Vocabulary Parallelism☆25Mar 10, 2025Updated 11 months ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆588Dec 9, 2024Updated last year
- [SCIS] MULTI-Benchmark: Multimodal Understanding Leaderboard with Text and Images☆44Nov 19, 2025Updated 2 months ago
- A Survey on Data Selection for Language Models☆254Apr 29, 2025Updated 9 months ago
- NeurIPS 2024 tutorial on LLM Inference☆47Dec 10, 2024Updated last year
- Official implement of MIA-DPO☆70Jan 23, 2025Updated last year
- a curated list of the role of small models in the LLM era☆111Sep 23, 2024Updated last year
- [ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆75Jun 25, 2025Updated 7 months ago
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 3 months ago
- ☆11Jun 4, 2021Updated 4 years ago
- ☆12Mar 5, 2025Updated 11 months ago
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆826Mar 17, 2025Updated 10 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆53Jun 24, 2024Updated last year
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆35Jul 16, 2025Updated 6 months ago
- ☆74Oct 21, 2023Updated 2 years ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated 2 weeks ago
- Official Implementation of Video-MA2MBA☆12Dec 3, 2024Updated last year
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- This is the official implementation of TAGCOS: Task-agnostic Gradient Clustered Coreset Selection for Instruction Tuning Data☆13Jul 21, 2024Updated last year
- ☆11Jan 8, 2025Updated last year
- LMM for VQA, tcsvt version☆11Jul 19, 2024Updated last year
- TNT-KID: Transformer-based Neural Tagger for Keyword Identification☆11Jul 25, 2024Updated last year
- [ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models☆15Jun 18, 2025Updated 7 months ago
- ☆32Jan 25, 2026Updated 2 weeks ago
- Solution of KDD cup 2021☆11Jun 16, 2021Updated 4 years ago
- Towards a Mechanistic Understanding of Large Reasoning Models: A Survey of Training, Inference, and Failures☆30Jan 29, 2026Updated 2 weeks ago
- CoV: Chain-of-View Prompting for Spatial Reasoning☆50Jan 23, 2026Updated 3 weeks ago
- ☆30Nov 5, 2024Updated last year
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Sep 26, 2024Updated last year