apify / crawlee-pythonLinks
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
☆7,113Updated this week
Alternatives and similar repositories for crawlee-python
Users that are interested in crawlee-python are comparing it to the libraries listed below
Sorting:
- Rapidly build AI apps in Python☆6,470Updated last month
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,216Updated 8 months ago
- Python scraper based on AI☆21,678Updated 2 weeks ago
- An open-source RAG-based tool for chatting with your documents.☆24,597Updated 4 months ago
- Automate browser based workflows with AI☆17,115Updated this week
- Turn any webpage into structured data using LLMs☆6,083Updated 2 weeks ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆5,921Updated this week
- Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.☆5,958Updated this week
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,269Updated 2 months ago
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆20,474Updated this week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,725Updated 4 months ago
- The easiest way to use Agentic RAG in any enterprise☆4,348Updated 9 months ago
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser sandbox that lets you automate the web wit…☆5,945Updated this week
- LLM-powered multiagent persona simulation for imagination enhancement and business insights.☆7,108Updated 2 months ago
- Large Action Model framework to develop AI Web Agents☆6,190Updated 9 months ago
- Lightweight library for scraping web-sites with LLMs☆1,235Updated 3 weeks ago
- An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.☆27,579Updated last month
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extra…☆2,758Updated last week
- Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more.…☆2,483Updated this week
- ⚡️ GenBI (Generative BI) queries any database in natural language, generates accurate SQL (Text-to-SQL), charts (Text-to-Chart), and AI-p…☆12,897Updated this week
- A visual playground for agentic workflows: Iterate over your agents 10x faster☆5,575Updated 3 months ago
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,904Updated last month
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,355Updated 6 months ago
- PraisonAI is a production-ready Multi AI Agents framework, designed to create AI Agents to automate and solve problems ranging from simpl…☆5,468Updated last week
- A self-organizing file system with llama 3☆5,673Updated 3 months ago
- Lightpanda: the headless browser designed for AI and automation☆10,237Updated last week
- 🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!☆8,084Updated last week
- 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN☆55,537Updated this week
- Uncomplicated Observability for Python and beyond! 🪵🔥☆3,733Updated this week
- Agent Framework For Fintech☆7,673Updated this week