apify/crawlee-python

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/apify/crawlee-python)

apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

☆9,338

Alternatives and similar repositories for crawlee-python

Users that are interested in crawlee-python are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ScrapeGraphAI / Scrapegraph-ai
View on GitHub
Python scraper based on AI
☆28,455Updated this week
apify / crawlee
View on GitHub
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …
☆24,790Updated this week
stanford-oval / storm
View on GitHub
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
☆30,138Sep 30, 2025Updated 9 months ago
unclecode / crawl4ai
View on GitHub
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
☆73,163Updated this week
agno-agi / agno
View on GitHub
Build, run, and manage agent platforms.
☆41,224Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Skyvern-AI / skyvern
View on GitHub
Automate browser based workflows with AI
☆22,497Updated this week
mem0ai / mem0
View on GitHub
Universal memory layer for AI Agents
☆61,114Updated this week
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,113Updated this week
Cinnamon / kotaemon
View on GitHub
An open-source RAG-based tool for chatting with your documents.
☆25,563Updated this week
docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆63,447Updated this week
QuivrHQ / MegaParse
View on GitHub
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
☆7,401Feb 21, 2025Updated last year
browserbase / stagehand
View on GitHub
The SDK For Browser Agents
☆23,550Updated this week
assafelovic / gpt-researcher
View on GitHub
An autonomous agent that conducts deep research on any data using any LLM providers
☆28,387Updated this week
TabbyML / tabby
View on GitHub
Self-hosted AI coding assistant
☆33,730Jun 30, 2026Updated 2 weeks ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
khoj-ai / khoj
View on GitHub
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. …
☆35,853Jun 24, 2026Updated 3 weeks ago
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,621Jul 7, 2026Updated last week
getmaxun / maxun
View on GitHub
🔥 The open-source no-code platform for web scraping, crawling, search and AI data extraction • Turn websites into structured APIs in min…
☆16,570Updated this week
browser-use / browser-use
View on GitHub
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
☆105,458Updated this week
ItzCrazyKns / Vane
View on GitHub
Vane is an AI-powered answering engine.
☆35,693Apr 11, 2026Updated 3 months ago
fishaudio / fish-speech
View on GitHub
SOTA Open Source TTS
☆31,306Jun 9, 2026Updated last month
firecrawl / firecrawl
View on GitHub
The API to search, scrape, and interact with the web at scale. 🔥
☆152,864Updated this week
Zipstack / unstract
View on GitHub
LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
☆6,707Updated this week
crewAIInc / crewAI
View on GitHub
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work t…
☆55,736Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Avaiga / taipy
View on GitHub
Turns Data and AI algorithms into production-ready web applications in no time.
☆19,282Jun 21, 2026Updated 3 weeks ago
teableio / teable
View on GitHub
✨ AI Spreadsheet for Business
☆21,503Updated this week
vanna-ai / vanna
View on GitHub
🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.
☆23,782Feb 2, 2026Updated 5 months ago
lavague-ai / LaVague
View on GitHub
Large Action Model framework to develop AI Web Agents
☆6,381Jan 21, 2025Updated last year
jina-ai / reader
View on GitHub
Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
☆11,666May 22, 2026Updated last month
OpenHands / OpenHands
View on GitHub
🙌 OpenHands: AI-Driven Development
☆81,202Updated this week
roboflow / supervision
View on GitHub
We write your reusable computer vision tools. 💜
☆48,101Updated this week
Mintplex-Labs / anything-llm
View on GitHub
Stop renting your intelligence. Own it with AnythingLLM. Everything you need for a powerful local-first agent experience
☆63,495Updated this week
Aider-AI / aider
View on GitHub
aider is AI pair programming in your terminal
☆47,492May 22, 2026Updated last month
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
MervinPraison / PraisonAI
View on GitHub
PraisonAI 🦞 — Hire a 24/7 AI Workforce. Stop writing boilerplate and start shipping autonomous self-improving agents that research, plan…
☆8,483Updated this week
microsoft / graphrag
View on GitHub
A modular graph-based Retrieval-Augmented Generation (RAG) system
☆34,495Updated this week
reflex-dev / reflex
View on GitHub
🕸️ Web apps in pure Python 🐍
☆28,657Updated this week
mishushakov / llm-scraper
View on GitHub
Turn any webpage into structured data using LLMs
☆6,878Jun 15, 2026Updated last month
screenpipe / screenpipe
View on GitHub
YC (S26) | Record your screen 24/7 and plug into your agents. Local, private, secure. Connect to OpenClaw, Hermes agent and 100+ apps
☆20,292Updated this week
BerriAI / litellm
View on GitHub
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…
☆53,941Updated this week
mesop-dev / mesop
View on GitHub
Rapidly build AI apps in Python
☆6,589Jul 11, 2026Updated last week