Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
☆663Mar 17, 2026Updated this week
Alternatives and similar repositories for dingo
Users that are interested in dingo are comparing it to the libraries listed below
Sorting:
- Data annotation toolbox supports image, audio and video data.☆1,518Updated this week
- The Open-Source Data Annotation Platform☆1,194Feb 19, 2025Updated last year
- WanJuan3.0(“万卷·丝路”)一个作为综合性的纯文本语料库,采集了多个国家地区的网络公开信息、文献、专利等资料,数据总规模超1.2TB,Token总数超过300B,处于国际领先水平,首期开源的语料库主要由泰语、俄语、阿拉伯语、韩语和越南语5个子集构成,每个子集的数据…☆43Feb 13, 2025Updated last year
- Data annotation component library --provided as NPM packages☆147Nov 19, 2025Updated 4 months ago
- conversion doc(pdf/html/doc/docx/ppt/pptx)to markdown☆48Jul 23, 2024Updated last year
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆39May 28, 2025Updated 9 months ago
- SDK of OpenDataLab - https://opendatalab.org.cn☆59Jul 31, 2025Updated 7 months ago
- MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.☆24Dec 11, 2024Updated last year
- Our code for ICLR'25 paper "DataMan: Data Manager for Pre-training Large Language Models".☆121Feb 7, 2026Updated last month
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,569Feb 27, 2026Updated 3 weeks ago
- ☆23Nov 4, 2024Updated last year
- A Python package for interacting with the MinerU Vision-Language Model.☆109Feb 5, 2026Updated last month
- OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …☆6,765Updated this week
- Zone Evaluation: Revealing Spatial Bias in Object Detection (TPAMI 2024)☆46Dec 6, 2024Updated last year
- Web archiving utility library☆11Mar 11, 2026Updated last week
- A lightweight framework for building LLM-based agents☆2,231Updated this week
- Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷☆6,079Updated this week
- ☆19Oct 28, 2025Updated 4 months ago
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,694Mar 13, 2026Updated last week
- [ACL 2025] Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL☆14Oct 9, 2025Updated 5 months ago
- A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval☆13,681Mar 11, 2026Updated last week
- [ACL 2025 Best Theme Paper] This is the official implementation for the paper: "Meta-rater: A Multi-dimensional Data Selection Method for…☆190Aug 29, 2025Updated 6 months ago
- Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)☆46May 29, 2024Updated last year
- ☆14Apr 19, 2025Updated 11 months ago
- Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.☆56,255Mar 7, 2026Updated 2 weeks ago
- Easy Data Preparation with latest LLMs-based Operators and Pipelines.☆2,992Mar 12, 2026Updated last week
- Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, …☆13,120Mar 14, 2026Updated last week
- ☆12Sep 7, 2024Updated last year
- WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。☆13Apr 18, 2024Updated last year
- A unified evaluation library for multiple machine learning libraries☆269Mar 29, 2024Updated last year
- Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).☆7,172Oct 30, 2025Updated 4 months ago
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆267Jul 8, 2025Updated 8 months ago
- [ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models☆359Mar 22, 2024Updated last year
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆833Mar 17, 2025Updated last year
- 百度QA100万数据集☆45Nov 30, 2023Updated 2 years ago
- Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)☆15May 2, 2025Updated 10 months ago
- A TTS Trained on Universal Audio.☆41Jun 6, 2025Updated 9 months ago
- 万卷1.0多模态语料☆571Oct 20, 2023Updated 2 years ago
- A Next-Generation Training Engine Built for Ultra-Large MoE Models☆5,104Updated this week