apify / crawlee-pythonLinks
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
☆5,962Updated last week
Alternatives and similar repositories for crawlee-python
Users that are interested in crawlee-python are comparing it to the libraries listed below
Sorting:
- Python scraper based on AI☆20,595Updated 3 weeks ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,045Updated 5 months ago
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆4,163Updated 5 months ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆5,506Updated this week
- PraisonAI is a production-ready Multi AI Agents framework, designed to create AI Agents to automate and solve problems ranging from simpl…☆5,196Updated this week
- Agent Framework / shim to use Pydantic with LLMs☆11,275Updated this week
- OCR & Document Extraction using vision models☆11,603Updated 2 months ago
- Automate browser-based workflows with LLMs and Computer Vision☆13,908Updated this week
- Rapidly build AI apps in Python☆6,369Updated last month
- The easiest way to use Agentic RAG in any enterprise☆4,288Updated 6 months ago
- The easiest way to deploy agents, MCP servers, models, RAG, pipelines and more. No MLOps. No YAML.☆3,398Updated this week
- Large Action Model framework to develop AI Web Agents☆6,101Updated 6 months ago
- Full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning.☆30,807Updated this week
- NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other ent…☆2,720Updated this week
- Task-Aware Agent-driven Prompt Optimization Framework☆3,434Updated 2 weeks ago
- 🪄 Create rich visualizations with AI☆12,745Updated this week
- A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.☆9,020Updated this week
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆8,236Updated 6 months ago
- Turn any webpage into structured data using LLMs☆5,857Updated 2 months ago
- A powerful framework for building realtime voice AI agents 🤖🎙️📹☆6,872Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,004Updated 2 months ago
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,640Updated last month
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,882Updated this week
- The fastest way to create an HTML app☆6,545Updated this week
- The python library for real-time communication☆4,168Updated this week
- 🔥 Open-source no code web data extraction platform. Instantly turn any website into API or spreadsheet 🔥☆13,303Updated this week
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,768Updated last week
- Lightweight library for scraping web-sites with LLMs☆1,129Updated last month
- A language model programming library.☆5,801Updated last month
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser instance that lets you automate the web wi…☆4,867Updated this week