CyberCRI / refinedocLinks
Python library for extracting headers, footers and body from PDF
☆21Updated 5 months ago
Alternatives and similar repositories for refinedoc
Users that are interested in refinedoc are comparing it to the libraries listed below
Sorting:
- A better job search based on semantic matching☆17Updated last year
- Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extr…☆275Updated 2 weeks ago
- Spider ported to Python☆103Updated last week
- PDF text data extraction web app with OCR for scanned documents☆95Updated last year
- Graphy v1: A Realtime GraphRAG App using Langchain, Neo4j, GPT-4o, and Streamlit.☆73Updated last year
- Airbnb scraper made in Python☆109Updated last month
- Python Wrapper on top of Unofficial Medium API to quickly extract data from Medium's website.☆61Updated 6 months ago
- 🔌 Want one client library for all your embeddings? 💙 Choose Catsu! 🐱☆55Updated 2 weeks ago
- OCR with Google's AI technology (Cloud Vision API)☆77Updated 2 years ago
- Web scraping framework built for AI applications. Extract clean, structured content from any website with dynamic content handling, markd…☆52Updated 4 months ago
- Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable p…☆87Updated last year
- Scrapfly Python SDK for headless browsers and proxy rotation☆50Updated 3 weeks ago
- Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text,…☆107Updated last year
- ☆21Updated last year
- Fully automated AI based web scraping.☆33Updated 2 weeks ago
- larry.ai: A Batteries Included ChatGPT Frontend Framework & HTTP Proxy☆17Updated 2 years ago
- this master thesis project is based on OpenAI Whisper with the goal to transcibe interviews☆48Updated last year
- Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.☆302Updated 9 months ago
- Collection of PDF parsing libraries like AI based docling, claude, openai, gemini, meta's llama-vision, unstructured-io, and pdfminer, py…☆165Updated 5 months ago
- Audio to summary with openAI Whisper & GPT 3.5/4 using streamlit☆62Updated 2 years ago
- Run Python code on n8n☆214Updated 2 years ago
- AI + Legal APIs: A Tool-Based Retrieval Augmented Generation Workbench for Legal AI UX Research.☆121Updated last year
- This repository contains a Python script that allows users to download the audio from a YouTube video, transcribe it into text, detect th…☆155Updated 9 months ago
- Advanced AI email assistant using Groq for responsive replies, Llama for contextual information retrieval, and RAG with LangChain for enh…☆51Updated last year
- Web application that converts audio and video to text using AI, supporting various formats and self-hosting.☆128Updated 9 months ago
- Production-ready Python library for multi-provider LLM orchestration☆40Updated 3 months ago
- Extensible API and framework to build your Retrieval Augmented Generation (RAG) and Information Extraction (IE) applications with LLMs☆32Updated last month
- 😎 Awesome list of tools and projects with the awesome LangChain framework☆19Updated 2 years ago
- automate chatgpt using selenium without api☆75Updated last year
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆121Updated last year