zytedata / clear-htmlLinks
Remove DIVs, style stuff and normalize HTML preserving structure information
☆11Updated 2 weeks ago
Alternatives and similar repositories for clear-html
Users that are interested in clear-html are comparing it to the libraries listed below
Sorting:
- ☆37Updated 5 months ago
- Remote web browser automation.☆22Updated last year
- Common crawl extractor☆80Updated last year
- Pluggable DSL that uses pipes to perform a series of linear transformations to extract data☆16Updated last year
- Autogenerated CDP utilities that enable Python to control Chrome directly, without external automation drivers.☆17Updated last week
- The Web Scraping Club Free Repository☆151Updated 5 months ago
- httpx transport for curl_cffi (python bindings for curl-impersonate)☆23Updated 2 months ago
- Spider ported to Python☆94Updated 8 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆25Updated 11 months ago
- A microservice for document conversion at scale☆79Updated last week
- Dockerized FastAPI wrapper around the recognize-anything image recognition models☆25Updated last year
- Some tough questions to test new models.☆28Updated last year
- LLM plugin for embeddings using sentence-transformers☆72Updated 5 months ago
- The short and sweet way to create API clients in Python☆25Updated 3 weeks ago
- aiohttp-like interface to chromium. based on selenium_driverless to bypass cloudflare☆58Updated last month
- Python JSON benchmarking and "correctness".☆35Updated 2 years ago
- pai: A Python REPL with a built in AI agent☆42Updated 2 years ago
- Via Text Density Simple Web Crawler With Go☆13Updated 2 years ago
- Create an LLM XML context document from an llms.txt file☆22Updated last year
- Paste Word, get Markdown☆17Updated last year
- ☆26Updated last year
- ☆20Updated 6 months ago
- Hybrid Search (BM25 & Vector) with SQLite☆23Updated last year
- Library that helps use puppeteer in scrapy.☆52Updated 2 months ago
- ATUI - Assistant Textual User Interface☆23Updated last year
- Simple implementation of a GPT (training and inference) in PyTorch.☆12Updated last year
- See how HTTPX, Requests, and AIOHTTP libraries compare for sending network requests and find out which one may fit your case better.☆19Updated 3 weeks ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆50Updated last week
- The official Python library for Formulaic☆16Updated last year
- A polite and user-friendly downloader for Common Crawl data☆56Updated 2 months ago