zytedata / clear-htmlLinks
Remove DIVs, style stuff and normalize HTML preserving structure information
☆11Updated 5 months ago
Alternatives and similar repositories for clear-html
Users that are interested in clear-html are comparing it to the libraries listed below
Sorting:
- The Web Scraping Club Free Repository☆145Updated 2 months ago
- Remote web browser automation.☆19Updated last year
- Common crawl extractor☆78Updated last year
- Spider ported to Python☆87Updated 5 months ago
- Via Text Density Simple Web Crawler With Go☆12Updated 2 years ago
- Create an LLM XML context document from an llms.txt file☆21Updated 11 months ago
- pyppeteer stealth plugin, attempts to look like a normal browser☆22Updated 9 months ago
- Scrapy extension that gives you all the scraping monitoring, alerting, scheduling, and data validation you will need straight out of the…☆37Updated last year
- ☆20Updated last year
- aiohttp-like interface to chromium. based on selenium_driverless to bypass cloudflare☆56Updated 8 months ago
- LLM plugin for embeddings using sentence-transformers☆70Updated 3 months ago
- Simple implementation of a GPT (training and inference) in PyTorch.☆12Updated last year
- Python SDK for Browserbase☆59Updated this week
- httpx transport for curl_cffi (python bindings for curl-impersonate)☆16Updated 6 months ago
- This repository contains a Retrieval-Augmented Generation (RAG) framework developed in C++ for high performance and scalability, with CUD…☆96Updated 2 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 8 months ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆23Updated 4 months ago
- Use AWS Lambda functions as a proxy pool to scrape web pages.☆135Updated last year
- A FastAPI extension for integrating common AI agent frameworks.☆41Updated 6 months ago
- ☆20Updated 3 months ago
- 360M model running in the browser on WebGPU☆22Updated 11 months ago
- scraping and querying documents for LLMs☆23Updated last month
- Web scraping Page Objects core library☆102Updated 3 weeks ago
- xargs for semgrep☆28Updated last year
- ATUI - Assistant Textual User Interface☆23Updated last year
- Neural search engine for discovering semantically similar Python repositories on GitHub☆28Updated last year
- Some tough questions to test new models.☆28Updated last year
- Extract structured data from any unstructured web page☆41Updated last year
- Python client for Zyte API☆26Updated last month
- A microservice for document conversion at scale☆73Updated this week