apify / crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
☆5,651Updated this week
Alternatives and similar repositories for crawlee-python
Users that are interested in crawlee-python are comparing it to the libraries listed below
Sorting:
- Python scraper based on AI☆19,630Updated this week
- Rapidly build AI apps in Python☆6,247Updated 2 weeks ago
- Turn any webpage into structured data using LLMs☆4,862Updated 8 months ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆5,258Updated this week
- Open source Claude Artifacts – built with Llama 3.1 405B☆6,014Updated last month
- Lightpanda: the headless browser designed for AI and automation☆8,919Updated this week
- Automate browser-based workflows with LLMs and Computer Vision☆13,368Updated this week
- Scira (Formerly MiniPerplx) is a minimalistic AI-powered search engine that helps you find information on the internet and cites it too. …☆7,971Updated last week
- A language model programming library.☆5,759Updated 2 months ago
- Large Action Model framework to develop AI Web Agents☆6,052Updated 3 months ago
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆8,723Updated last week
- The Open-Source Visual Vibecoding Editor – Visually build, style, and edit your React App with AI☆9,458Updated this week
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,060Updated 2 months ago
- An AI web browsing framework focused on simplicity and extensibility.☆11,886Updated this week
- Agent Framework / shim to use Pydantic with LLMs☆9,498Updated this week
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆17,692Updated this week
- A powerful framework for building realtime voice AI agents 🤖🎙️📹☆5,988Updated this week
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser instance that lets you automate the web wi…☆4,334Updated this week
- 🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!☆5,170Updated this week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆6,416Updated 2 months ago
- 🔥 Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥☆12,613Updated this week
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆38,358Updated this week
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,162Updated this week
- 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN☆43,381Updated this week
- 🚀 The fast, Pythonic way to build MCP servers and clients☆9,835Updated this week
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web☆2,140Updated this week
- structured outputs for llms☆10,443Updated this week
- Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.☆3,537Updated 2 weeks ago
- Turns Data and AI algorithms into production-ready web applications in no time.☆18,089Updated this week
- An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.☆24,296Updated 2 weeks ago