apify / crawlee-pythonLinks
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
☆7,583Updated this week
Alternatives and similar repositories for crawlee-python
Users that are interested in crawlee-python are comparing it to the libraries listed below
Sorting:
- Turn any webpage into structured data using LLMs☆6,168Updated last month
- Python scraper based on AI☆22,357Updated last week
- Rapidly build AI apps in Python☆6,516Updated this week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,269Updated 11 months ago
- LLM-powered multiagent persona simulation for imagination enhancement and business insights.☆7,190Updated last month
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser sandbox that lets you automate the web wit…☆6,278Updated 2 weeks ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆6,068Updated this week
- Automate browser based workflows with AI☆20,181Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,708Updated 8 months ago
- 🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!☆8,852Updated this week
- Lightweight library for scraping web-sites with LLMs☆1,260Updated last month
- Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web, vision.☆4,148Updated last week
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,314Updated 2 months ago
- Large Action Model framework to develop AI Web Agents☆6,275Updated last year
- An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.☆27,816Updated 3 months ago
- AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording☆16,464Updated last week
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extra…☆2,817Updated this week
- Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. …☆32,312Updated 3 weeks ago
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.☆3,792Updated 3 weeks ago
- OCR & Document Extraction using vision models☆12,032Updated 8 months ago
- Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.☆6,470Updated 3 weeks ago
- 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN☆58,878Updated this week
- Turn websites into clean data pipelines & structured APIs in minutes!☆14,164Updated this week
- A language model programming library.☆5,877Updated 7 months ago
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆7,516Updated 6 months ago
- Stay on top of trending topics on social media and the web with AI☆3,944Updated 11 months ago
- An open-source RAG-based tool for chatting with your documents.☆24,873Updated 6 months ago
- An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Co…☆5,905Updated last month
- PraisonAI is a production-ready Multi AI Agents framework, designed to create AI Agents to automate and solve problems ranging from simpl…☆5,567Updated this week
- The first AI agent that builds permissionless integrations through reverse engineering platforms' internal APIs.☆4,527Updated 5 months ago