PDF解析(文字,章节,表格,图片,参考),基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答,摘要,信息抽取
☆212Oct 17, 2023Updated 2 years ago
Alternatives and similar repositories for pdf_parsing
Users that are interested in pdf_parsing are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ChatPDF Implement PDF parsing based on LangChain and LLM language model(ChatGLM,GPT...) | ChatPDF 基于LangChain和LLM语言模型实现PDF解析阅读☆55Jun 5, 2024Updated last year
- DB-based Optical Chemical Structure Recognition☆12Sep 12, 2022Updated 3 years ago
- 本项目旨在收集开源的表格智能任务数据集(比如表 格问答、表格-文本生成等),将原始数据整理为指令微调格式的数据并微调LLM,进而增强LLM对于表格数据的理解,最终构建出专门面向表格智能任务的大型语言模型。☆643Apr 22, 2024Updated last year
- Creating a graph that summarizes correlations between stocks and using a Graph Neural Network to encode that information to be utilized i…☆18May 19, 2023Updated 2 years ago
- 在RAG技术中,嵌入向量的生成和匹配是关键环节。本文介绍了一种基于CLIP/BLIP模型的嵌入服务,该服务支持文本和图像的嵌入生成与相似度计算,为多模态信息检索提供了基础能力。☆42Dec 28, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- 由于BAAI/bge-large-zh 在Hugging Face Clone不下来,手动下载下来,便于使用☆11Sep 16, 2023Updated 2 years ago
- llama信息抽取实战☆101Apr 29, 2023Updated 2 years ago
- Based on RapidOCR, extract the PDF content☆187Mar 6, 2026Updated last month
- 本项目主要用于掌纹特征提取,主要工作包含: 1. 手掌掌纹ROI提取 2. 特征提取网络设置 3. 特征网络训练预测 其中,掌纹提取部分,主要实现参照`palm_rpi_ext` 实现,核心调用出口位置为instance.py 训练与推理为 train_palm_ext…☆12Sep 18, 2024Updated last year
- LangChain实现的基于PDF文档构建问答知识库☆39Apr 12, 2024Updated last year
- 文档方向分类☆221Feb 3, 2026Updated 2 months ago
- A hydraulic surrogate model and real-time control methods of urban drainage networks.☆37Jan 7, 2026Updated 3 months ago
- 中文CLIP:自定义数据集,可根据文图提取向量,实现文图匹配。☆22Sep 14, 2022Updated 3 years ago
- 大语言模型ChatGLM-6B为基座,接入文档阅读功能进行实时问答,可上传txt/docx/pdf多种文件类型。☆42Sep 11, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Multi-Label Text Classification Based On Bert☆23Feb 28, 2023Updated 3 years ago
- A simple, easy-to-hack GraphRAG implementation☆15Sep 21, 2024Updated last year
- 将微信读书划线和笔记同步到Readwise☆14Jun 1, 2023Updated 2 years ago
- 可以成功Lora微调的Qwen-VL模型☆16Oct 27, 2023Updated 2 years ago
- Code for DBellQuant☆34Jan 30, 2026Updated 2 months ago
- 基于InternLm chat 7B大模型基座,构建一个Agent ,可以调用 MMYOLO 工具来完成图像内视觉任务☆11Oct 30, 2024Updated last year
- FinGLM: 致力于构建一个开放的、公益的、持久的金融大模型项目,利用开源开放来促进「AI+金融」。☆2,211May 8, 2024Updated last year
- 智谱AI 2024年金融行业大模型挑战赛仓库☆60Feb 19, 2025Updated last year
- Universal information extraction with instruction learning☆399Feb 28, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Converted the Jina Tokenizer regex pattern to python.☆26Aug 26, 2024Updated last year
- A simple implement for multi-label text classification with Bert. I will extend the code to a higher version for very long text over 512,…☆12Jun 2, 2021Updated 4 years ago
- 在index-tts-vllm的基础上,实现了并提供了模拟流式合成音频的接口服务及客户端测试脚本☆26Sep 2, 2025Updated 7 months ago
- 使用opencv部署yolo11表格检测,它是百度网盘AI大赛-表格检测的第2名方案,方案里包含表格框检测,表格角点检测,表格方向分类,一共三个模块。我依然是编写了C++和Python两个版本的程序☆13Dec 12, 2024Updated last year
- 基于大语言模型的检索增强生成RAG示例☆173May 4, 2025Updated 11 months ago
- RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG功能,基于本地LLM、embedding模型、reranker模型实现,支持GraphRAG,无须安装任何第三方agent库。☆844Apr 2, 2025Updated last year
- The online version is temporarily unavailable because we cannot afford the key. You can clone and run it locally. Note: we set defaul ope…☆828May 28, 2024Updated last year
- baseline method for CROCS 2024☆10Jan 24, 2024Updated 2 years ago
- [EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction☆4,362Jul 19, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Question and Answer based on Anything.☆13,932Mar 24, 2025Updated last year
- 信息抽取相关论文。☆78Apr 13, 2023Updated 2 years ago
- 表格检测和表结构识别☆24Dec 5, 2022Updated 3 years ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆306Sep 10, 2024Updated last year
- ☆14Apr 15, 2024Updated last year
- 利用BERT预训练模型进行文本生成,可用于对话、摘要、问题生成等任务。 目前支持策略,词表的插入和删除、自定义Character Embedding、随机词替换等☆10Jun 1, 2022Updated 3 years ago
- The implementation of "Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Deco…☆38Aug 29, 2025Updated 7 months ago