opendatalab / opendatalab-datasets
datasets resource
☆113Updated 3 weeks ago
Alternatives and similar repositories for opendatalab-datasets
Users that are interested in opendatalab-datasets are comparing it to the libraries listed below
Sorting:
- Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)☆47Updated 11 months ago
- Data annotation component library --provided as NPM packages☆93Updated 2 weeks ago
- SDK of OpenDataLab - https://opendatalab.org.cn☆57Updated last year
- ☆25Updated 2 years ago
- The Open-Source Data Annotation Platform☆811Updated 2 months ago
- 万卷1.0多模态语料☆560Updated last year
- Data annotation toolbox supports image, audio and video data.☆1,190Updated 2 weeks ago
- Dingo: A Comprehensive Data Quality Evaluation Tool☆144Updated last week
- ☆489Updated 9 months ago
- [ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models☆347Updated last year
- ☆324Updated 11 months ago
- GOT的vLLM加速实现 并结合 MinerU 实现RAG中的pdf 解析☆56Updated 6 months ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆281Updated 8 months ago
- ☆18Updated this week
- ☆226Updated last year
- Enhance LLM agents with rich tool APIs☆387Updated 8 months ago
- Open foundation models, such LLama2, ChatGLM, etc.☆114Updated 7 months ago
- 一些大语言模型和多模态模型的应用,主要包括小模型,Agent,跨模态搜索,OCR、RAG、ChatBot等等☆170Updated this week
- Awesome LLM Benchmarks to evaluate the LLMs across text, code, image, audio, video and more.☆140Updated last year
- Alpaca Chinese Dataset -- 中文指令微调数据集☆203Updated 7 months ago
- Generate dialog data from documents using LLM like ChatGLM2 or ChatGPT;利用ChatGLM2,ChatGPT等大模型根据文档生成对话数据集☆157Updated last year
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆424Updated last month
- 大模型多维度中文对齐评测基准 (ACL 2024)☆386Updated 9 months ago
- ☆59Updated last year
- 专注于对话系统领域的技术分享,重点写《Dify应用操作和源码剖析》专栏。☆93Updated 10 months ago
- WanJuan3.0(“万卷·丝路”)一个作为综合性的纯文本语料库,采集了多个国家地区的网络公开信息、文献、专利等资料,数据总规模超1.2TB,Token总数超过300B,处于国际领先水平,首期开源的语料库主要由泰语、俄语、阿拉伯语、韩语和越南语5个子集构成,每个子集的数据…☆24Updated 3 months ago
- 顾名思义:手搓的RAG☆122Updated last year
- 利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data☆160Updated 5 months ago
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆472Updated last month
- vLLM Documentation in Chinese Simplified / vLLM 中文文档☆69Updated this week