apify / crawlee-pythonLinks
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
☆6,322Updated this week
Alternatives and similar repositories for crawlee-python
Users that are interested in crawlee-python are comparing it to the libraries listed below
Sorting:
- Python scraper based on AI☆21,394Updated last month
- Automate browser-based workflows with LLMs and Computer Vision☆14,476Updated this week
- Turn any webpage into structured data using LLMs☆6,037Updated 2 weeks ago
- Rapidly build AI apps in Python☆6,444Updated 3 months ago
- Large Action Model framework to develop AI Web Agents☆6,179Updated 8 months ago
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,248Updated 4 months ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,182Updated 7 months ago
- Lightpanda: the headless browser designed for AI and automation☆9,814Updated this week
- Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web, vision.☆4,017Updated this week
- AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording☆15,695Updated 3 weeks ago
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser sandbox that lets you automate the web wit…☆5,165Updated this week
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆19,563Updated this week
- 🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!☆7,361Updated this week
- A language model programming library.☆5,846Updated 3 months ago
- The AI Browser Automation Framework☆17,236Updated this week
- Lighter web automation with Python☆8,031Updated 5 months ago
- Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.☆5,304Updated this week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,698Updated 3 months ago
- The first AI agent that builds permissionless integrations through reverse engineering platforms' internal APIs.☆4,464Updated last month
- An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.☆27,456Updated 3 months ago
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,250Updated 3 weeks ago
- A framework for Claude Opus to intelligently orchestrate subagents.☆4,280Updated last year
- GenAI Agent Framework, the Pydantic way☆12,695Updated this week
- 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN☆53,894Updated this week
- The fastest way to create an HTML app☆6,650Updated last week
- 🔍 AI search engine - self-host with local or cloud LLMs☆3,456Updated last year
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,868Updated this week
- ⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡☆13,663Updated this week
- Infinite Bookshelf: Generate entire books in seconds using Groq and Llama3☆1,341Updated last month
- Build better UIs faster.☆8,876Updated 3 weeks ago