data-liberation / data-liberation-resourcesLinks
liberate all kinds of data from PDF and other unstructural format and make the information machine-readable and visualizeable for popular tools.
☆31Updated 7 years ago
Alternatives and similar repositories for data-liberation-resources
Users that are interested in data-liberation-resources are comparing it to the libraries listed below
Sorting:
- baike schema crawler for baidu baike , hudongbaike. 面向百度百科与互动百科的概念分类体系抓取脚本☆38Updated 7 years ago
- ☆70Updated 7 years ago
- table understanding dataset for comparative evaluation of different table understanding algorithms☆14Updated 7 years ago
- 中文环境突发事件语料库(Chinese Environment Emergency Corpus)-上海大学-语义智能实验室☆46Updated 9 years ago
- Extract templated Open Information Extraction☆17Updated 8 years ago
- ☆23Updated 5 years ago
- Framework for information extraction from tables☆41Updated 6 years ago
- BlackLab Frontend, a feature-rich corpus search interface for BlackLab.☆22Updated 3 weeks ago
- ICDAR 2021 Competition on Scientific Literature Parsing☆35Updated 5 years ago
- SegPhrase working on Chinese and Arabic☆36Updated 8 years ago
- Data collection, alignment and TAUS repository☆23Updated 7 years ago
- ☆95Updated 5 years ago
- 基于CEC语料库挖掘要素识别规则,对新闻报道类生语料进行自动标注☆20Updated 10 years ago
- A tool for extracting arbitrary tables from untagged PDF documents☆39Updated 4 years ago
- ☆40Updated 4 years ago
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆109Updated last year
- Table Extraction Tool☆90Updated 7 years ago
- ☆81Updated 3 years ago
- ☆87Updated 5 years ago
- XFUND: A Multilingual Form Understanding Benchmark☆208Updated 3 years ago
- Finetune Bloom big language model with Lora method☆32Updated 2 years ago
- schemakg, a knowledge graph for schema that seeks to cover a range of things as much as possible including entity schema and event schema…☆31Updated 4 years ago
- Optical table recognition - recognize tables in scan images using OpenCV☆112Updated 6 years ago
- MNBVC项目-ShareGPT语料清洗☆15Updated last year
- 汉字字符特征提取工具,可以提取出字符中的字音(声母、韵母、声调)、字形(偏旁、部首)、四角编码等特征,同时可作为tensor输入到模型☆137Updated 5 years ago
- PAGE XML format collection for document image page content and more☆67Updated 4 years ago
- This repository contains a 403 images dataset for table detection in documents.☆83Updated 6 years ago
- PDF table extraction☆10Updated 3 years ago
- Functional and structural analysis of tables in research papers (Table disentangling)☆20Updated 8 years ago
- An open-source classical Chinese information processing toolkit developed by Tsinghua Natural Language Processing Group☆51Updated 6 years ago