zytedata / clear-htmlLinks
Remove DIVs, style stuff and normalize HTML preserving structure information
☆13Updated 3 months ago
Alternatives and similar repositories for clear-html
Users that are interested in clear-html are comparing it to the libraries listed below
Sorting:
- Pluggable DSL that uses pipes to perform a series of linear transformations to extract data☆16Updated last year
- Datasette plugin for searching all searchable tables at once☆29Updated 3 months ago
- A Python interface for the Chrome DevTools Protocol. Enables direct control of Chrome without external automation drivers.☆24Updated 2 months ago
- Simple implementation of a GPT (training and inference) in PyTorch.☆13Updated 2 years ago
- Remote web browser automation.☆23Updated last year
- Common crawl extractor☆84Updated last year
- Git scrapers for scraping the fediverse☆19Updated this week
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated last year
- LLM plugin for embeddings using sentence-transformers☆74Updated 9 months ago
- Spider ported to Python☆103Updated 2 weeks ago
- Neural search engine for discovering semantically similar Python repositories on GitHub☆29Updated last year
- Loadable spellfix1 extension for sqlite as python package☆27Updated last year
- Vector Search Benchmarking suite☆12Updated 2 months ago
- ☆20Updated 3 weeks ago
- Create an LLM XML context document from an llms.txt file☆23Updated last year
- Web scraping Page Objects core library☆104Updated last week
- Run embedding models using ONNX☆35Updated 2 years ago
- Turn natual language into commands. Your CLI tasks, now as easy as a conversation. Run it 100% offline, or use OpenAI's models.☆63Updated last year
- ☆11Updated last year
- ATUI - Assistant Textual User Interface☆23Updated 2 years ago
- xargs for semgrep☆28Updated last year
- Use SQL to instantly query stories, users and other items from Hacker News. Open source CLI. No DB required.☆18Updated 3 months ago
- Python client for Zyte API☆28Updated 4 months ago
- The short and sweet way to create API clients in Python☆25Updated 4 months ago
- https://mimesniff.spec.whatwg.org/ implementation for Python☆13Updated 2 years ago
- pyppeteer stealth plugin, attempts to look like a normal browser☆27Updated last year
- Spider templates for automatic crawlers.☆34Updated last month
- A microservice for document conversion at scale☆97Updated 2 weeks ago
- GO GO EXPERIMENTAL LAB☆17Updated 2 weeks ago
- Functional composable pipelines allowing clean separation of the business logic and its implementation☆11Updated 5 months ago