lipoja / URLExtractLinks
URLExtract is python class for collecting (extracting) URLs from given text based on locating TLD.
☆265Updated last year
Alternatives and similar repositories for URLExtract
Users that are interested in URLExtract are comparing it to the libraries listed below
Sorting:
- URL normalization for Python☆96Updated 3 months ago
- Extract text from HTML☆134Updated 5 years ago
- Extracts the top level domain (TLD) from the URL given.☆182Updated 2 months ago
- Parse numbers written in natural language☆122Updated 9 months ago
- Extract price amount and currency symbol from a raw text string☆333Updated 5 months ago
- Parsing JavaScript objects into Python data structures☆211Updated last month
- universal character encoding detector☆405Updated 2 months ago
- Common interface for data container classes☆68Updated last week
- A pure-Python robots.txt parser with support for modern conventions.☆70Updated last week
- Modern robots.txt Parser for Python☆195Updated last year
- A python based HTML to text conversion library, command line client and Web service.☆316Updated 2 months ago
- A modern CSS selector implementation for BeautifulSoup☆244Updated this week
- A Python library for working with and comparing language codes.☆345Updated 2 months ago
- python library for getting metadata☆146Updated last week
- ASCII transliterations of Unicode text - GitHub mirror☆576Updated 3 months ago
- python library to simplify working with jsonlines and ndjson data☆296Updated 11 months ago
- universal character encoding detector☆59Updated 10 months ago
- Async WebDriver implementation for asyncio and asyncio-compatible frameworks☆359Updated last year
- A simple, Qt-Webengine powered web browser with built in functionality for basic scrapy webscraping support.☆110Updated last year
- Atom, RSS and JSON feed parser for Python 3☆117Updated 2 years ago
- Generator of User-Agent header☆342Updated last year
- Library to populate items using XPath and CSS with a convenient API☆47Updated this week
- Web scraping Page Objects core library☆102Updated last month
- Automatic unit test generation for Scrapy.☆57Updated 4 years ago
- ndjson with the same interface as the builtin json module☆68Updated 2 years ago
- Allowlist-based HTML cleaner☆150Updated last month
- Fast multi-keyword search engine for text strings☆256Updated 10 months ago
- Python wrapper for RE2☆103Updated 2 months ago
- Accurately find/replace/remove emojis in text strings☆163Updated last year
- Page Object pattern for Scrapy☆123Updated 3 weeks ago