flexpaper / pdf2jsonLinks
PDF2JSON is a conversion library based on XPDF (3.02) which can be used for high performance PDF page by page conversion to JSON and XML format. It also supports compressing data to minimize size. PDF2JSON is available for Windows, OSX and Linux. Please see https://flowpaper.com for more information
☆317Updated 5 years ago
Alternatives and similar repositories for pdf2json
Users that are interested in pdf2json are comparing it to the libraries listed below
Sorting:
- it will contain different utilities for GMail API over OAuth2☆415Updated 2 years ago
- ☆391Updated last year
- Web clipper browser extension for saving highlights, screenshots, and automatically extracting content from web pages.☆374Updated 4 years ago
- Query CSVs using SQL☆167Updated 6 years ago
- An algorithm for generating robust XPath locators for web testing.☆185Updated 3 years ago
- Extract and decompose (fuzzy) URLs (including emails, which are conceptually a part of URLs) in texts with Area-Pattern-based modularity☆358Updated 11 months ago
- Source code of my personal blog☆350Updated 11 months ago
- JSON processing utility☆508Updated 3 years ago
- using XPDF, pdftojson extracts text from PDF files as JSON, including word bounding boxes.☆147Updated 2 years ago
- pdftilecut lets you sub-divide a PDF page(s) into smaller pages so you can print them on small form printers.☆361Updated last year
- A versioning data store for time-variant graph data.☆344Updated last year
- Geocode rows in a SQLite database table☆237Updated 3 years ago
- Interactive visualization library for concept map☆92Updated 6 years ago
- Repository for the Scan Your Pdf community☆657Updated last month
- API for extracting a table from an image or a PDF☆90Updated last year
- Qbix Platform for powering Social Apps (http://qbix.com/platform)☆93Updated last year
- DOM Recorder☆190Updated 4 years ago
- DropBox/GoogleDrive-style 2-way sync using rsync and fswatch☆143Updated 5 years ago
- Apify actor that opens a web page in headless Chrome and analyzes the HTML and JavaScript objects, looks for schema.org microdata and JSO…☆153Updated 2 years ago
- A copy of the original Arc90 repo with links to many of the current ports.☆240Updated last year
- ☆157Updated 4 years ago
- Repository for Pipes☆277Updated 5 months ago
- ☆177Updated 5 years ago
- Textricator is a tool to extract text from documents and generate structured data.☆351Updated 10 months ago
- JavaScript port of TLSH (Trend Micro Locality Sensitive Hash)☆162Updated 4 years ago
- WarcDB: Web crawl data as SQLite databases.☆404Updated last year
- Screening emails workflow☆101Updated last year
- An interactive demo walk-through we built to give visitors a feel for what the Trevor.io platform does☆251Updated 5 years ago
- A Global Exhaustive First and Last Name Database☆738Updated 2 years ago
- Human Response Code: Designed to be recognized by humans and OCR. Encodes all valid URL characters to images.☆228Updated 6 years ago