hbh112233abc / pdfplumberLinks
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
☆58Updated last year
Alternatives and similar repositories for pdfplumber
Users that are interested in pdfplumber are comparing it to the libraries listed below
Sorting:
- change pdf to txt☆67Updated last year
- ☆41Updated 2 years ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆290Updated 9 months ago
- SMP 2023 ChatGLM金融大模型挑战赛 60 分baseline思路介绍☆185Updated last year
- Python ROUGE Score Implementation for Chinese Language Task (official rouge score)☆102Updated last year
- 国内首个全参数训练的法律大模型 HanFei-1.0 (韩非)☆120Updated last year
- 利用LLM+敏感词库,来自动判别是否涉及敏感词。☆124Updated last year
- clueai工具包: 3行代码3分钟,自定义需要的API!☆233Updated 2 years ago
- llama信息抽取实战☆100Updated 2 years ago
- 🌳CED: Catalog Extraction from Documents☆16Updated last year
- "桃李“: 国际中文教育大模型☆181Updated last year
- Based on RapidOCR, extract the PDF content☆172Updated last month
- pke_zh, python keyphrase extraction for chinese(zh). 中文关键词或关键句提取工具,实现了KeyBert、PositionRank、TopicRank、TextRank等算法,开箱即用。☆208Updated last year
- 文档方向分类☆219Updated 7 months ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆103Updated 7 months ago
- ChatGLM2-6B微调, SFT/LoRA, instruction finetune☆108Updated last year
- 基于sentence transformers和chatglm实现的文档搜索工具☆156Updated 2 years ago
- basic framework for rag(retrieval augment generation)☆85Updated last year
- company name parser, extract company name brand. 中文公司名称分词工具,支持公司名称中的地名,品牌名(主词),行业词,公司名后缀提取。☆90Updated 2 years ago
- ☆66Updated 9 months ago
- 【间隙·树·排序算法】 对OCR结果或PDF提取的文本进行版面分析,按人类阅读顺序进行排序。☆140Updated last year
- A Multi-Modal Dataset of Chinese Governmental Docunments☆34Updated 4 years ago
- ChatGLM-6B fine-tuning.☆135Updated 2 years ago
- Code & Data for our Paper "NaSGEC: Multi-Domain Chinese Grammatical Error Correction for Native Speaker Texts" (ACL 2023 Findings)☆89Updated 4 months ago
- 基于ChatGPT构建的中文self-instruct数据集☆118Updated 2 years ago
- 骆驼QA,中文大语言阅读理解模型。☆74Updated 2 years ago
- 大语言模型指令调优工具(支持 FlashAttention)☆173Updated last year
- chatglm-6b微调/LORA/PPO/推理, 样本为自动生成的整数/小数加减乘除运算, 可gpu/cpu☆164Updated last year
- 时间抽取、解析、标准化工具☆52Updated 2 years ago
- 中文标注工具,支持NER、文本分类、关系标注、对话标注等。☆76Updated 10 months ago