apify / crawlee-pythonLinks
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
☆6,896Updated this week
Alternatives and similar repositories for crawlee-python
Users that are interested in crawlee-python are comparing it to the libraries listed below
Sorting:
- Rapidly build AI apps in Python☆6,461Updated 2 weeks ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,198Updated 7 months ago
- Python scraper based on AI☆21,594Updated 2 weeks ago
- Build your own inference engine with expert control. Deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.☆3,591Updated this week
- OCR & Document Extraction using vision models☆11,882Updated 5 months ago
- Turn any webpage into structured data using LLMs☆6,055Updated last month
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆5,860Updated this week
- Automate browser-based workflows with LLMs and Computer Vision☆14,615Updated this week
- Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.☆5,707Updated this week
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,258Updated last month
- Lightweight library for scraping web-sites with LLMs☆1,229Updated last week
- An open-source RAG-based tool for chatting with your documents.☆24,520Updated 3 months ago
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser sandbox that lets you automate the web wit…☆5,488Updated this week
- Structured data extraction and instruction calling with ML, LLM and Vision LLM☆5,015Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,299Updated 5 months ago
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆19,953Updated this week
- The fastest way to create an HTML app☆6,656Updated last week
- A language model programming library.☆5,845Updated 4 months ago
- A system for agentic LLM-powered data processing and ETL☆3,001Updated last week
- WebApps in pure Python. No JavaScript, HTML and CSS needed☆3,281Updated this week
- Turns Data and AI algorithms into production-ready web applications in no time.☆18,809Updated this week
- ⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡☆13,700Updated last week
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,895Updated last month
- The python library for real-time communication☆4,343Updated last month
- GenAI Agent Framework, the Pydantic way☆12,960Updated this week
- Build better UIs faster.☆8,889Updated 2 weeks ago
- Lightpanda: the headless browser designed for AI and automation☆10,036Updated this week
- 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN☆54,747Updated this week
- Lighter web automation with Python☆8,054Updated 5 months ago
- 🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!☆7,468Updated last week