apify / crawlee-pythonLinks
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
☆7,221Updated this week
Alternatives and similar repositories for crawlee-python
Users that are interested in crawlee-python are comparing it to the libraries listed below
Sorting:
- Python scraper based on AI☆21,906Updated this week
- Turn any webpage into structured data using LLMs☆6,110Updated 3 weeks ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,234Updated 9 months ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆5,963Updated this week
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,285Updated 2 months ago
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser sandbox that lets you automate the web wit…☆5,995Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,418Updated 6 months ago
- 🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!☆8,247Updated this week
- Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.☆6,128Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆18,906Updated last month
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆20,683Updated this week
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,940Updated 2 months ago
- AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording☆16,023Updated 2 months ago
- Rapidly build AI apps in Python☆6,481Updated last week
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,694Updated 4 months ago
- OCR & Document Extraction using vision models☆11,968Updated 6 months ago
- Lightweight library for scraping web-sites with LLMs☆1,239Updated last month
- Large Action Model framework to develop AI Web Agents☆6,209Updated 10 months ago
- A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama☆1,892Updated last week
- PraisonAI is a production-ready Multi AI Agents framework, designed to create AI Agents to automate and solve problems ranging from simpl…☆5,487Updated 3 weeks ago
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extra…☆2,763Updated last week
- 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN☆56,514Updated this week
- Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.☆3,715Updated this week
- The python library for real-time communication☆4,420Updated this week
- A language model programming library.☆5,849Updated 5 months ago
- Python APIs for web automation, testing, and bypassing bot-detection with ease.☆11,914Updated last week
- 🪄 Create rich visualizations with AI☆14,389Updated last week
- Turn any website into clean, contextualized data pipelines for your workflows☆13,919Updated last week
- The easiest way to use Agentic RAG in any enterprise☆4,362Updated 10 months ago
- LLM-powered multiagent persona simulation for imagination enhancement and business insights.☆7,138Updated 3 months ago