joliver1981 / PDFSplitter
Python script to split PDF files into separate files based on bookmarks
☆14Updated 2 years ago
Alternatives and similar repositories for PDFSplitter:
Users that are interested in PDFSplitter are comparing it to the libraries listed below
- Applied BERT based model to extract relations from 29 annual reports of listed companies and news; Used spaCy library and BERT model for …☆12Updated 2 years ago
- 该项目主要是抽取病历文件中的一些关键信息。并将抽取的内容进行streamlit前端的展示。目前支持的文件类型:图片,pdf文件,word文件☆23Updated 2 years ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 4 years ago
- Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF☆18Updated 3 years ago
- Elasticsearch with T5/Bert/Other models provided by huggingface Transfomers.☆14Updated last year
- Search PDFs using Jina, DocArray and Jina Hub☆55Updated 2 years ago
- Collecting news articles for all the companies in the R1000, for a pre-defined set of news outlets, using Diffbot's Knowledge Graph☆11Updated last year
- With this Python script, the mouse pointer is moved periodically in order to bypass ideal detection.☆13Updated 2 years ago
- An intelligent OCR to detect tables and pure text inside PDFs and obtaing a csv file and a txt from it☆14Updated 6 years ago
- Simple pdf to text with python using PDFtk and PyPDF2☆20Updated last year
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆63Updated this week
- ☆12Updated 8 months ago
- A python library for extracting text from PDFs without losing the formatting of the PDF content.☆75Updated 3 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- Combine Tecent's bert as service model and rasa_nlu for text classification☆20Updated 2 years ago
- A Google Trends Analytics Package☆13Updated 7 months ago
- A dataset for business models for small companies and NLP research.