jayhenry / pdf2txt_mnbvcLinks
☆41Updated last year
Alternatives and similar repositories for pdf2txt_mnbvc
Users that are interested in pdf2txt_mnbvc are comparing it to the libraries listed below
Sorting:
- 通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser☆46Updated 11 months ago
- 中文原生检索增强生成测评基准☆118Updated last year
- TianGong-AI-Unstructure☆65Updated last month
- SearchGPT: Building a quick conversation-based search engine with LLMs.☆46Updated 5 months ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆283Updated 8 months ago
- A Multi-Modal Dataset of Chinese Governmental Docunments☆34Updated 4 years ago
- ☆27Updated 7 months ago
- 大语言模型训练和服务调研☆37Updated last year
- 本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作☆61Updated 7 months ago
- 文本去重☆72Updated last year
- ☆66Updated 8 months ago
- 基于ChatGPT构建的中文self-instruct数据集☆117Updated 2 years ago
- 利用LLM+敏感词库,来自动判别是否涉及敏感词。☆124Updated last year
- 大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning☆64Updated 10 months ago
- 国内首个全参数训练的法律大模型 HanFei-1.0 (韩非)☆116Updated last year
- 中文世界的NLP自动标注开源工具,简单样本,交给LabelFast。☆71Updated 4 months ago
- Legal-Eagle-InternLM 是一个基于商汤科技和上海人工智能实验室推出的书生浦语大模型InternLM的法律问答机器人。旨在为用户提供符合3H(即Helpful、Honest、Harmless)原则的专业、智能、全面的法律服务的法律领域大模型。☆57Updated last year
- ☆162Updated 2 years ago
- The LLM of NL2GQL with NebulaGraph or Neo4j☆92Updated last year
- ☆63Updated 2 years ago
- 基于Qwen2模型进行通用信息抽取【实体/关系/事件抽取】☆31Updated 10 months ago
- ChatGLM2-6B微调, SFT/LoRA, instruction finetune☆108Updated last year
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆194Updated 4 months ago
- 用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.☆251Updated last year
- Silk Road will be the dataset zoo for Luotuo(骆驼). Luotuo is an open sourced Chinese-LLM project founded by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子…☆39Updated last year
- LAiW: A Chinese Legal Large Language Models Benchmark☆80Updated 11 months ago
- A large-scale language model for scientific domain, trained on redpajama arXiv split☆133Updated last year
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆99Updated 6 months ago
- 🌳CED: Catalog Extraction from Documents☆16Updated last year
- 实现了Baichuan-Chat微调,Lora、QLora等各种微调方式,一键运行。☆70Updated last year