houking-can / PDFConverter
Best PDF Converter! PDF to any format, pdf2word/excel/xml/html/txt...
☆146Updated 3 years ago
Alternatives and similar repositories for PDFConverter:
Users that are interested in PDFConverter are comparing it to the libraries listed below
- It's a python script that convert PDF to txt using PDFMiner☆46Updated 3 years ago
- 天池比赛作品整理。实现从pdf中提取出姓名、出生年月、性别、电话、最高学历、籍贯、落户市县、政治面貌、毕业院校、工作单位、工作内容、职务、项目名称、项目责任、学位、毕业时间、工作时间、项目时间共18个字段。☆112Updated 5 months ago
- 法律领域词典☆14Updated 5 years ago
- It's for a research for AI and law☆43Updated 4 years ago
- 一个简单易用的 Python 模块,用于通过字符串来操作日期/时间。正则时间提取,字符串时间解析,字符串时间提取。中文时间提取,一句话里面提取时间☆75Updated 6 months ago
- 中文PDF转TXT的实用工具☆30Updated 3 years ago
- 中文文本相似度计算器☆131Updated 3 months ago
- 使用python-opencv识别图片中的表格数据转换为csv☆108Updated 4 years ago
- 该项目可以帮助您实现大批量从pdf文件中导出表格数据。☆39Updated 5 years ago
- 基于gensim模块的中文句子相似度计算☆53Updated 6 years ago
- 错别字纠正算法。调用pycorrector接口,使用规则。☆66Updated 5 years ago
- FinanceEventGraph,金融领域事件图谱开放数据集,可用于事件图谱搭建于实验,包括3865个acquire并购事件、9093个invest投资事件,总计12960的事件☆19Updated last year
- CCKS2019评测任务五-公众公司公告信息抽取,第3名☆122Updated 5 years ago
- Recognize tables and text from scanned images that contain tables. 从包含表格的扫描图片中识别表格和文字☆251Updated last year
- RelExt: A Tool for Relation Extraction from Text. 文本实体关系抽取工具。☆48Updated 2 years ago
- 时间抽取、解析、标准化工具☆50Updated 2 years ago
- 简单的表格图片内容ocr☆38Updated 5 years ago
- 利用sklearn和gensim中的tfidf,lsa,doc2vec进行查询与文档匹配搜索☆21Updated 2 years ago
- ☆81Updated 6 years ago
- company name parser, extract company name brand. 中文公司名称分词工具,支持公司名称中的地名,品牌名(主词),行业词,公司名后缀提取。☆84Updated 2 years ago
- 裁判文书网爬虫☆37Updated last year
- PaddleOCR 输出结果的行对齐,表格制式图像OCR行对齐☆38Updated 3 years ago
- liberate all kinds of data from PDF and other unstructural format and make the information machine-readable and visualizeable for popul…☆28Updated 6 years ago
- An exploration for Eventline (important news Rank organized by pulic time),针对某一事件话题下的新闻报道集合,通过使用docrank算法,对新闻报道进行重要性识别,并通过新闻报道时间挑选出时间线上重要…☆216Updated 6 years ago
- 百度百科爬虫☆69Updated 7 months ago
- 时间关键词正则提取以及标准化☆21Updated 3 years ago
- A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.☆171Updated 2 years ago
- 基于simhash的文本去重算法☆20Updated 3 years ago
- This repository contains the code that extracts a table from an image and exports it to an Excel.☆59Updated 6 years ago