btimby / fulltext
Python library for extracting text from various file formats (for indexing).
☆111Updated 3 years ago
Alternatives and similar repositories for fulltext:
Users that are interested in fulltext are comparing it to the libraries listed below
- Fast multi-keyword search engine for text strings☆252Updated 5 months ago
- Python powered spreadsheets☆173Updated 6 years ago
- Regular Expression based parsers for extracting data from natural languages☆70Updated 7 years ago
- Extract structured data from HTML and XML documents like a boss.☆49Updated 2 months ago
- Asynchronous Python HTTP Requests for Humans using twisted☆31Updated 5 years ago
- URL normalization for Python☆94Updated 2 years ago
- Python wrapper for RE2☆100Updated 5 months ago
- Automatically exported from code.google.com/p/solrpy☆40Updated 4 years ago
- PyQuery-based scraping micro-framework.☆116Updated 3 years ago
- WTForms integration for peewee☆111Updated 3 months ago
- Library to query python dicts☆90Updated last year
- Python 3 AsyncIO powered scraping framework with batteries included☆20Updated 8 years ago
- Python Regular Expressions for Humans™.☆229Updated 6 years ago
- 📚 Ordered Multivalue Dictionary. Helps power furl.☆68Updated 2 years ago
- A Python library for extracting titles, images, descriptions and canonical urls from HTML.☆147Updated 4 years ago
- URL Transformation, Sanitization☆103Updated last year
- Extract text from HTML☆133Updated 4 years ago
- A Python library for finding feed links on websites.☆50Updated 2 years ago
- remove signature blocks from emails☆85Updated 5 years ago
- async python client for the sonic search backend☆135Updated this week
- A module for querying the DOM tree and writing XPath expressions using native Python syntax.☆127Updated 6 years ago
- Python light ORM for Redis☆79Updated 6 years ago
- extract difference between two html pages☆32Updated 6 years ago
- Library to populate items using XPath and CSS with a convenient API☆46Updated 2 weeks ago
- Python library for manipulating URLs (and some URIs) in a more natural way.☆187Updated 3 years ago
- Restrict crawl and scraping scope using matchers.☆25Updated 8 years ago
- Simple Python cache and memoizing module☆84Updated 10 months ago
- Pure Python wrapper to the Yajl C Library☆82Updated 2 months ago
- Simple Web UI for Scrapy spider management via Scrapyd☆51Updated 6 years ago
- Scrapy downloader middleware that stores response HTMLs to disk.☆18Updated 9 months ago