haihua0913 / awesome-dq4mlLinks
Useful resources on data quality for machine learning and artificial intelligence.
☆19Updated last month
Alternatives and similar repositories for awesome-dq4ml
Users that are interested in awesome-dq4ml are comparing it to the libraries listed below
Sorting:
- RuleRAG: Rule-guided Retrieval-Augmented Generation with Language Models for Question Answering☆22Updated 6 months ago
- This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆32Updated 2 months ago
- ☆28Updated last year
- Hierarchical RAG☆12Updated this week
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated last year
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆14Updated 2 months ago
- The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generation☆19Updated this week
- a curated list of the role of small models in the LLM era☆100Updated 8 months ago
- ☆65Updated 2 months ago
- OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆75Updated 2 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated 7 months ago
- Repository for the NeurIPS 2024 paper "SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up…☆24Updated 5 months ago
- Official repository of Graph RAG-Tool Fusion and ToolLinkOS dataset.☆12Updated 3 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆69Updated 8 months ago
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆27Updated last week
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆45Updated 6 months ago
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆18Updated 2 months ago
- Open replication of DeepSeek R1 for text-to-graph extraction.☆94Updated 4 months ago
- A Toolkit for Table-based Question Answering☆112Updated last year
- Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents, CVPR 2025☆18Updated 4 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆38Updated 8 months ago
- ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory☆95Updated last month
- ☆29Updated 9 months ago
- ☆68Updated 11 months ago
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆83Updated 4 months ago
- ☆47Updated 3 months ago
- ☆17Updated 6 months ago
- [WWW 2025] A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System.☆70Updated last week
- Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…☆18Updated 9 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆78Updated 6 months ago