apify / crawlee-pythonLinks
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
☆7,363Updated this week
Alternatives and similar repositories for crawlee-python
Users that are interested in crawlee-python are comparing it to the libraries listed below
Sorting:
- Python scraper based on AI☆22,142Updated last week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,256Updated 10 months ago
- Lightweight library for scraping web-sites with LLMs☆1,252Updated 3 weeks ago
- 🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!☆8,405Updated this week
- Automate browser based workflows with AI☆19,983Updated this week
- Turn any webpage into structured data using LLMs☆6,151Updated last month
- Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data …☆20,935Updated this week
- A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama☆1,901Updated last month
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,927Updated 3 months ago
- OCR & Document Extraction using vision models☆11,997Updated 7 months ago
- Lighter web automation with Python☆8,193Updated last month
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,309Updated last month
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,060Updated 2 months ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆6,033Updated this week
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,963Updated 3 weeks ago
- The All in One Framework to Build Undefeatable Scrapers☆3,533Updated this week
- A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documen…☆3,281Updated this week
- Your agent in your terminal, equipped with local tools: writes code, uses the terminal, browses the web, vision.☆4,135Updated this week
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,071Updated last year
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆7,496Updated 5 months ago
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,770Updated 3 weeks ago
- Rapidly build AI apps in Python☆6,508Updated this week
- Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.☆6,387Updated this week
- ☆2,080Updated 9 months ago
- Large Action Model framework to develop AI Web Agents☆6,250Updated 11 months ago
- Structured data extraction and instruction calling with ML, LLM and Vision LLM☆5,084Updated 2 weeks ago
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,609Updated 8 months ago
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser sandbox that lets you automate the web wit…☆6,184Updated 2 weeks ago
- LLM-powered multiagent persona simulation for imagination enhancement and business insights.☆7,171Updated 3 weeks ago
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web☆2,329Updated 6 months ago