zytedata / clear-htmlLinks
Remove DIVs, style stuff and normalize HTML preserving structure information
☆13Updated 3 months ago
Alternatives and similar repositories for clear-html
Users that are interested in clear-html are comparing it to the libraries listed below
Sorting:
- Datasette plugin for searching all searchable tables at once☆29Updated 2 months ago
- Blueprint by Mozilla.ai for answering questions about structured documents☆37Updated 10 months ago
- Via Text Density Simple Web Crawler With Go☆12Updated 2 years ago
- Web scraping Page Objects core library☆104Updated this week
- Common crawl extractor☆84Updated last year
- Paste Word, get Markdown☆17Updated last year
- Remote web browser automation.☆23Updated last year
- Create an LLM XML context document from an llms.txt file☆23Updated last year
- A Python interface for the Chrome DevTools Protocol. Enables direct control of Chrome without external automation drivers.☆23Updated last month
- Pluggable DSL that uses pipes to perform a series of linear transformations to extract data☆16Updated last year
- LLM plugin for embeddings using sentence-transformers☆74Updated 9 months ago
- Vector Search Benchmarking suite☆11Updated 2 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year
- Zyte API integration for Scrapy☆39Updated last week
- ☆20Updated 2 weeks ago
- Scrape various open data directories to create an index of what's available out there☆37Updated 11 months ago
- A helper library full of URL-related heuristics.☆73Updated 4 months ago
- Git scrapers for scraping the fediverse☆19Updated this week
- scraping and querying documents for LLMs☆24Updated 3 months ago
- Dockerized FastAPI wrapper around the recognize-anything image recognition models☆25Updated last year
- Python wrapper for Ferret☆45Updated 4 years ago
- ATUI - Assistant Textual User Interface☆23Updated 2 years ago
- Neural search engine for discovering semantically similar Python repositories on GitHub☆29Updated last year
- Loadable spellfix1 extension for sqlite as python package☆27Updated last year
- Scrapfly Python SDK for headless browsers and proxy rotation☆50Updated 3 weeks ago
- 💎🐍 A robust job orchestration framework for Python, backed by modern PostgreSQL☆177Updated this week
- Chrome Extension for exploring Hugging Face datasets 🔎☆48Updated last year
- Spider ported to Python☆103Updated last week
- ☆19Updated last year
- Python JSON benchmarking and "correctness".☆36Updated 2 years ago