opendatalab / opendatalab-datasetsLinks
datasets resource
☆121Updated 2 months ago
Alternatives and similar repositories for opendatalab-datasets
Users that are interested in opendatalab-datasets are comparing it to the libraries listed below
Sorting:
- The Open-Source Data Annotation Platform☆918Updated 7 months ago
- 万卷1.0多模态语料☆567Updated last year
- Data annotation toolbox supports image, audio and video data.☆1,355Updated last month
- Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)☆47Updated last year
- ☆542Updated last year
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆299Updated last year
- Data annotation component library --provided as NPM packages☆127Updated last month
- Dingo: A Comprehensive AI Data Quality Evaluation Tool☆451Updated this week
- ☆353Updated last year
- PDF解析工具:GOT的vLLM加速实现,MinerU做布局识别裁剪、GOT做表格公式解析,实现RAG中的pdf解析☆62Updated 10 months ago
- Analysis of Chinese and English layouts 中英文版面分析☆244Updated last month
- SDK of OpenDataLab - https://opendatalab.org.cn☆57Updated last month
- Alpaca Chinese Dataset -- 中文指令微调数据集☆214Updated 11 months ago
- Llama3-Tutorial(XTuner、LMDeploy、OpenCompass)☆512Updated last year
- 基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。☆372Updated 2 weeks ago
- ☆25Updated 2 years ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆120Updated 2 months ago
- 一些大语言模型和多模态模型的生态,主要包括跨模态搜索、投机解码、QAT量化、多模态量化、ChatBot、OCR☆189Updated last month
- ☆99Updated 6 months ago
- 通义千问VLLM推理部署DEMO☆603Updated last year
- An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to infer…☆757Updated 6 months ago
- ☆67Updated last year
- 大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning☆69Updated last year
- ☆1,050Updated this week
- AGI资料汇总学习(主要包括LLM和AIGC),持续更新......☆425Updated this week
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆290Updated 2 months ago
- 整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post…☆830Updated last month
- 将SmolVLM2的视觉头与Qwen3-0.6B模型进行了拼接微调☆351Updated last week
- 利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data☆184Updated 9 months ago
- Train a 1B LLM with 1T tokens from scratch by personal☆732Updated 4 months ago