haihua0913 / awesome-dq4mlLinks
Useful resources on data quality for machine learning and artificial intelligence.
☆21Updated 6 months ago
Alternatives and similar repositories for awesome-dq4ml
Users that are interested in awesome-dq4ml are comparing it to the libraries listed below
Sorting:
- a curated list of the role of small models in the LLM era☆105Updated last year
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆33Updated last year
- The code for paper: Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search☆59Updated 3 months ago
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆85Updated 9 months ago
- Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning☆85Updated last year
- (ICCV 2025) OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆88Updated 3 months ago
- This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆40Updated 3 weeks ago
- A Toolkit for Table-based Question Answering☆113Updated 2 years ago
- The All-in-one Judge Models introduced by Opencompass☆113Updated 3 months ago
- ☆95Updated 10 months ago
- Open replication of DeepSeek R1 for text-to-graph extraction.☆99Updated 8 months ago
- ☆31Updated last year
- Data and Code for EMNLP 2025 Findings Paper "MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search"☆71Updated 3 months ago
- The official implementation of "LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented…☆43Updated 6 months ago
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆37Updated 4 months ago
- ☆67Updated 6 months ago
- (ICLR 2025) AgentRefine: Enhancing Agent Generalization through Refinement Tuning☆18Updated 7 months ago
- FuseAI Project☆87Updated 8 months ago
- ☆79Updated last year
- PGRAG☆51Updated last year
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆190Updated last year
- The official GitHub page for the survey paper "A Survey on Data Augmentation in Large Model Era"☆129Updated last year
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆98Updated 4 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆41Updated last year
- ☆105Updated 4 months ago
- [ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆82Updated last year
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆96Updated last week
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆22Updated 8 months ago
- ☆71Updated 4 months ago
- Reformatted Alignment☆112Updated last year