PDF解析(文字,章节,表格,图片,参考),基于大模型(ChatGLM2-6B, RWKV)+langchain+streamlit的PDF问答,摘要,信息抽取
☆211Oct 17, 2023Updated 2 years ago
Alternatives and similar repositories for pdf_parsing
Users that are interested in pdf_parsing are comparing it to the libraries listed below
Sorting:
- ChatPDF Implement PDF parsing based on LangChain and LLM language model(ChatGLM,GPT...) | ChatPDF 基于LangChain和LLM语言模型实现PDF解析阅读☆55Jun 5, 2024Updated last year
- 本项目旨在收集开源的表格智能任务数据集(比如表格问答、表格-文本生成等),将原始数据整理为指令微调格式的数据并微调LLM,进而增强LLM对于表格数据的理解,最终构建出专门面向表格智能任务的大型语言模型。☆644Apr 22, 2024Updated last year
- 《大语言模型》综述全书学习笔记☆13Aug 2, 2024Updated last year
- Creating a graph that summarizes correlations between stocks and using a Graph Neural Network to encode that information to be utilized i…☆18May 19, 2023Updated 2 years ago
- 由于BAAI/bge-large-zh 在Hugging Face Clone不下来,手动下载下来,便于使用☆11Sep 16, 2023Updated 2 years ago
- llama信息抽取实战☆101Apr 29, 2023Updated 2 years ago
- Based on RapidOCR, extract the PDF content☆186Mar 6, 2026Updated 2 weeks ago
- 本项目主要用于掌纹特征提取,主要工作包含: 1. 手掌掌纹ROI提取 2. 特征提取网络设置 3. 特征网络训练预测 其中,掌纹提取部分,主要实现参照`palm_rpi_ext` 实现,核心调用出口位置为instance.py 训练与推理为 train_palm_ext…☆12Sep 18, 2024Updated last year
- 文档方向分类☆222Feb 3, 2026Updated last month
- A hydraulic surrogate model and real-time control methods of urban drainage networks.☆35Jan 7, 2026Updated 2 months ago
- 中文CLIP:自定义数据集,可根据文图提取向量,实现文图匹配。☆22Sep 14, 2022Updated 3 years ago
- 大语言模型ChatGLM-6B为基座,接入文档阅读功能进行实时问答,可上传txt/docx/pdf多种文件类型。☆42Sep 11, 2023Updated 2 years ago
- Multi-Label Text Classification Based On Bert☆23Feb 28, 2023Updated 3 years ago
- A simple, easy-to-hack GraphRAG implementation☆15Sep 21, 2024Updated last year
- 将微信读书划线和笔记同步到Readwise☆14Jun 1, 2023Updated 2 years ago
- 可以成功Lora微调的Qwen-VL模型☆16Oct 27, 2023Updated 2 years ago
- Github repo for Peifeng's internship project☆13Nov 7, 2023Updated 2 years ago
- Official repository for "Unveiling Opinion Evolution via Prompting and Diffusion for Short Video Fake News Detection", ACL Findings 2024.☆15Apr 25, 2025Updated 10 months ago
- FinGLM: 致力于构建一个开放的、公益的、持久的金融大模型项目,利用开源开放来促进「AI+金融」。☆2,202May 8, 2024Updated last year
- Universal information extraction with instruction learning☆394Feb 28, 2025Updated last year
- 智谱AI 2024年金融行业大模型挑战赛仓库☆60Feb 19, 2025Updated last year
- ☆68Jan 20, 2026Updated 2 months ago
- Converted the Jina Tokenizer regex pattern to python.☆26Aug 26, 2024Updated last year
- 在index-tts-vllm的基础上,实现了并提供了模拟流式合成音频的接口服务及客户端测试脚本☆27Sep 2, 2025Updated 6 months ago
- A simple implement for multi-label text classification with Bert. I will extend the code to a higher version for very long text over 512,…☆12Jun 2, 2021Updated 4 years ago
- Improving langchain knowledge graphs using baml☆43Aug 3, 2025Updated 7 months ago
- Conversational agents for engineering simulations with minimal human input using Microsoft AutoGen & GPT-4o.☆41Aug 4, 2024Updated last year
- ☆35Aug 13, 2025Updated 7 months ago
- 基于大语言模型的检索增强生成RAG示例☆173May 4, 2025Updated 10 months ago
- RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG 功能,基于本地LLM、embedding模型、reranker模型实现,支持GraphRAG,无须安装任何第三方agent库。☆841Apr 2, 2025Updated 11 months ago
- [EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction☆4,341Jul 19, 2025Updated 8 months ago
- Question and Answer based on Anything.☆13,887Mar 24, 2025Updated 11 months ago
- 表格检测和表结构识别☆24Dec 5, 2022Updated 3 years ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆307Sep 10, 2024Updated last year
- 利用BERT预训练模型进行文本生成,可用于对话、摘要、问题生成等任务。 目前支持策略,词表的插入和删除、自定义Character Embedding、随机词替换等☆10Jun 1, 2022Updated 3 years ago
- 基于cnstd+cnocr作为基础,封装的一个ocr的web服务☆10Nov 21, 2021Updated 4 years ago
- ☆16Apr 7, 2024Updated last year
- Limitations of MultiLabel Conditional Generation☆13Oct 12, 2023Updated 2 years ago
- Viscacha:通用信息抽取数据集收集☆27Feb 21, 2024Updated 2 years ago