jaepil / pdfminer3k
Python 3 port of pdfminer
☆188Updated 6 years ago
Related projects: ⓘ
- Useful test spiders for Scrapy☆184Updated 4 years ago
- A pure python based utility to extract text and images from docx files.☆504Updated 11 months ago
- A more complete example of programming with PDFMiner, which continues where the default documentation stops☆215Updated 4 years ago
- Scrapy extension to control spiders using JSON-RPC☆295Updated 5 years ago
- A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab☆930Updated 6 years ago
- Python module for JSON data encoding, including jsonlint. See the project Wiki here on Github. Also read the README at the bottom of th…☆301Updated 4 years ago
- Python library for parsing .docx (Office Open XML) files☆52Updated 4 years ago
- An extendable docx file format parser and converter☆185Updated 3 years ago
- PhantomJS Downloader for Scrapy, Yeah!☆94Updated 10 years ago
- Python binding to libpoppler-qt5☆42Updated 10 months ago
- Scalable Bloom Filter implemented in Python☆163Updated last year
- Python bindings for CHMLIB☆55Updated 10 months ago
- A utility to read and write pdfs with Python. Superseded: see https://github.com/knowah/PyPDF2☆80Updated 8 years ago
- Scrapy Middleware to set a random User-Agent for every Request.☆202Updated 5 years ago
- Utilities for working with Excel files that require both xlrd and xlwt.☆273Updated 5 years ago
- The tutorial for xlrd, xlwt and xlutils☆309Updated 4 years ago
- An efficient simhash implementation for python☆124Updated 4 years ago
- The simplest way to extract text from PDFs in Python☆426Updated 2 years ago
- Conservatively convert html to markdown☆96Updated 4 years ago
- Run JavaScript code from Python (EOL: https://gist.github.com/doloopwhile/8c6ec7dd4703e8a44e559411cb2ea221)☆705Updated 4 years ago
- Reads, queries and modifies Microsoft Word 2007/2008 docx files.☆1,069Updated 9 years ago
- ScrapyDemo : Redis MySQLdb logging IngoreHttpRequestMiddleware UserAgentMiddleware HttpProxyMiddleware rules☆38Updated 8 years ago
- CSS Selectors for Python☆291Updated 4 months ago
- Python DB-API module for SQLite 3.☆370Updated 2 years ago
- A simple python script to translate chinese to pinyin based on Mandarin.dat☆207Updated 6 months ago
- Statistical Interactive Visualization with pandas+Jupyter integration on top of Echarts.☆118Updated 2 years ago
- a chinese segment base on crf☆233Updated 5 years ago
- A library for reading (unencrypted) mobi-reader files in Python☆153Updated 9 months ago
- This is the repository for "Classic" wxPython. All new development is happening in the Phoenix project at https://github.com/wxWidgets/Ph…☆297Updated 4 years ago
- A fast, pure-Python, untyped, in-memory database engine, using Python syntax to manage data, instead of SQL, inspired by PyDbLite.☆20Updated 6 years ago