janosh / pdf-compressorLinks
CLI + Python API for batch compressing PDFs
☆32Updated last month
Alternatives and similar repositories for pdf-compressor
Users that are interested in pdf-compressor are comparing it to the libraries listed below
Sorting:
- ☆43Updated 7 months ago
- Type discovery for Python☆24Updated 9 years ago
- This repository provides various Python methods for finding and aggregating synonyms for an individual word or a list of words.☆33Updated 2 years ago
- Python command line application to convert Markdown to PDF.☆54Updated last year
- A small framework taking over the manual training process described in the Tesseract3 Wiki: https://code.google.com/p/tesseract-ocr/wiki/…☆132Updated 2 years ago
- Common Crawl Index Server☆70Updated 5 months ago
- Library for extracting text and timestamps from multiple subtitle files (.ass, .ssa, .srt, .sub, .txt).☆53Updated last year
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- Python code and data for the post "Word Segmentation, or Makingsenseofthis"☆17Updated 2 years ago
- tools for creating, inspecting and modifying torrent files☆12Updated 3 years ago
- Convert text from PDF to XML.☆45Updated 6 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated last year
- Automatically exported from code.google.com/p/guess-language☆52Updated last year
- Yet another tool to search through your (exported) ChatGPT conversations☆12Updated 10 months ago
- Extract docx headers, footers, (formatted) text, footnotes, endnotes, properties, and images.☆187Updated last week
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 2 years ago
- A curated list of resources around PDF files☆137Updated last year
- "Python Rule-based feAture sTructure Analysis" or "Python Rule-bAsed Text Analysis"☆70Updated 4 years ago
- PDF to XML ALTO file converter☆248Updated this week
- Generate a regular expression that describes a set of strings.☆30Updated 2 years ago
- Conversion tools from various formats to StarDict.☆33Updated this week
- convert epub file to txt☆90Updated 5 years ago
- ☆11Updated 4 years ago
- Index and search PDF files using Apache Lucene and PDF Box☆44Updated last month
- The Python Package Index (PyPI) contains over 300,000 Python packages. Need a Python library but don't want to search through all the opt…☆15Updated 3 years ago
- Advanced similarity and duplicate source code at scale.☆55Updated 6 years ago
- Desktop full-text search tool☆34Updated last year
- Python Unicode Block Utilities☆24Updated 4 years ago
- Clean useless .sdr folders in your Kindle.☆16Updated this week
- LaMachine - A software distribution of our in-house as well as some 3rd party NLP software - Virtual Machine, Docker, or local compilatio…☆68Updated last year