datasets resource
☆136Apr 14, 2026Updated this week
Alternatives and similar repositories for opendatalab-datasets
Users that are interested in opendatalab-datasets are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)☆46May 29, 2024Updated last year
- SDK of OpenDataLab - https://opendatalab.org.cn☆59Jul 31, 2025Updated 8 months ago
- Data annotation component library --provided as NPM packages☆148Updated this week
- AAAI 2024: Visual Instruction Generation and Correction☆96Feb 4, 2024Updated 2 years ago
- WanJuan-CC是以CommonCrawl为基础,经过数据抽取,规则清洗,去重,安全过滤,质量清洗等步骤得到的高质量数据。☆13Apr 18, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Data annotation toolbox supports image, audio and video data.☆1,544Updated this week
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆48May 24, 2024Updated last year
- LabelBee is an annotation Library☆300Mar 27, 2026Updated 3 weeks ago
- The Open-Source Data Annotation Platform☆1,211Feb 19, 2025Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆66Sep 6, 2024Updated last year
- Out-of-the-box Annotation Toolbox☆396Apr 19, 2024Updated 2 years ago
- WanJuan3.0(“万卷·丝路”)一个作为综合性的纯文本语料库,采集了多个国家地区的网络公开信息、文献、专利等资料,数据总规模超1.2TB,Token总数超过300B,处于国际领先水平,首期开源的语料库主要由泰语、俄语、阿拉伯语、韩语和越南语5个子集构成,每个子集的数据…☆44Feb 13, 2025Updated last year
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆465Sep 28, 2025Updated 6 months ago
- (ICCV 2025) OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆96Dec 3, 2025Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 万卷1.0多模态语料☆572Oct 20, 2023Updated 2 years ago
- A Python package for interacting with the MinerU Vision-Language Model.☆112Updated this week
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,579Jan 3, 2025Updated last year
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆2,114Apr 14, 2025Updated last year
- The official implementation of the paper "CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis"☆16Sep 2, 2024Updated last year
- KITE (Knowledge-Intensive Task Evaluation) is an end-to-end benchmark for RAG pipelines☆23Aug 14, 2024Updated last year
- Services and guidelines for normalizing drug and other therapy terms☆13Feb 26, 2026Updated last month
- Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.☆60,483Updated this week
- ☆14Apr 19, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A PyTorch implementation of Cyclical Learning Rates☆25Jan 30, 2018Updated 8 years ago
- The complete NUMA-optimized branch of the ktransformers project☆25Nov 3, 2025Updated 5 months ago
- 生僻字OCR识别优化训练☆16Feb 16, 2023Updated 3 years ago
- vllm混合推理扩展插件,支持多NUMA混合推理,单卡推理Qwen3-Next模型可达1000+ prefill☆32Nov 7, 2025Updated 5 months ago
- 纯前端的 New API 调用测试页面,用来测试 OpenAI/Anthropic/Google 的一些特殊调用方式。所有数据仅在浏览器本地处理与保存。☆42Jan 29, 2026Updated 2 months ago
- ICDO: International Classification of Diseases Ontology☆12Apr 19, 2024Updated 2 years ago
- Official repository for ODQA experiments from Decomposed Prompting: A Modular Approach for Solving Complex Tasks, ICLR23☆12Jul 28, 2023Updated 2 years ago
- A SNOMED CT Concept Validation Library using Drools (Business Rules Engine)☆11Apr 9, 2026Updated last week
- Using GAN to create synthetic and partially synthetic EEG data to augment training sets for motor imagery interaction tasks☆13Aug 27, 2019Updated 6 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,667Apr 10, 2026Updated last week
- Voice activity detection (VAD) library and Go bindings based on WebRTC's VAD engine☆11Mar 1, 2018Updated 8 years ago
- 🕸 GlotWeb: Web Indexing for Minority Languages (WWW 2026)☆17Feb 27, 2026Updated last month
- Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).☆7,187Oct 30, 2025Updated 5 months ago
- GPT/llama + SQL + PyGWalker + Flask☆23Sep 3, 2023Updated 2 years ago
- ☆16May 27, 2024Updated last year
- Some Useful Tools Code☆16Feb 3, 2026Updated 2 months ago