dogatana / docx2md
☆35Updated 2 weeks ago
Alternatives and similar repositories for docx2md:
Users that are interested in docx2md are comparing it to the libraries listed below
- 中文原生检索增强生成测评基准☆115Updated last year
- 通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser☆46Updated 10 months ago
- 在RAG技术中,嵌入向量的生成和匹配是关键环节。本文介绍了一种基于CLIP/BLIP模型的嵌入服务,该服务支持文本和图像的嵌入生成与相似度计算,为多模态信息检索提供了基础能力。☆23Updated 3 months ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆91Updated 5 months ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆279Updated 7 months ago
- ☆63Updated 7 months ago
- bge推理优化相关脚本☆28Updated last year
- Here is a demo for PDF parser (Including OCR, object detection tools)☆34Updated 6 months ago
- 千问14B和7B的逐行解释☆57Updated last year
- ☆26Updated 6 months ago
- 基于chatglm快速搭建文档问答机器人☆88Updated last year
- A Multi-Modal Dataset of Chinese Governmental Docunments☆32Updated 4 years ago
- 阅读顺序、Layoutreader☆11Updated 11 months ago
- 中文论文、证券类、财报类PDF数据☆27Updated 10 months ago
- Fast pdf translate是一款pdf翻译软件,基于MinerU实现pdf转markdown的功能,接着对markdown进行分割, 送给大模型翻译,最后组装翻译结果并由pypandoc生成结果pdf。☆16Updated last month
- CodeAssist is an advanced code completion tool that provides high-quality code completions for Python, Java, C++ and so on. CodeAssist 是一…☆58Updated last year
- Code implement reposity of Paper HiQA☆100Updated last month
- 中文世界的NLP自动标注开源工具,简单样本,交给LabelFast。☆70Updated 3 months ago
- gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。☆167Updated last week
- Reproduction paper --- PDFTriage : Question Answering over Long, Structured Documents☆40Updated last year
- TianGong-AI-Unstructure☆63Updated last week
- ✏️0成本LLM微调上手项目,⚡️一步一步使用colab训练法律LLM,基于microsoft/phi-1_5、chatglm3,包含lora微调,全参微调☆72Updated last year
- use chatGLM to perform text embedding☆45Updated 2 years ago
- 演示 vllm 对中文大语言模型的神奇效果☆31Updated last year
- 文档方向分类☆216Updated 5 months ago
- ChatGLM2-6B微调, SFT/LoRA, instruction finetune☆107Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆135Updated 4 months ago
- 基于Qwen2模型进行通用信息抽取【实体/关系/事件抽取】☆31Updated 9 months ago
- 该项目是为了使用layoutlmv3针对中文图片训练和推理。 其中主要解决三个问题: 1.数据标准化成可以的训练数据集格式 2.layoutlmv3-base-chinese 分词修改 2.超过512长度的文本切分和滑窗操作☆44Updated 7 months ago
- GTS Engine: A powerful NLU Training System。GTS引擎(GTS-Engine)是一款开箱即用且性能强大的自然语言理解引擎,聚焦于小样本任务,能够仅用小样本就能自动化生产NLP模型。☆91Updated 2 years ago