CyberCRI / refinedocLinks
Python library for extracting headers, footers and body from PDF
☆20Updated 3 months ago
Alternatives and similar repositories for refinedoc
Users that are interested in refinedoc are comparing it to the libraries listed below
Sorting:
- A better job search based on semantic matching☆17Updated last year
- Web application that converts audio and video to text using AI, supporting various formats and self-hosting.☆131Updated 8 months ago
- Graphy v1: A Realtime GraphRAG App using Langchain, Neo4j, GPT-4o, and Streamlit.☆69Updated last year
- Python Wrapper on top of Unofficial Medium API to quickly extract data from Medium's website.☆58Updated 5 months ago
- Production-ready Python library for multi-provider LLM orchestration☆40Updated 2 months ago
- ☆48Updated 10 months ago
- 🚀 Leverage the power of LLM to improve your resume. Build a Streamlit application powered by Langchain, OpenAI and Google Generative AI.☆40Updated last year
- ☆33Updated 4 months ago
- A lightweight Amazon scraper library.☆77Updated 7 months ago
- Search for words, documents, images, videos, news and maps using the Brave search engine. Downloading files and images to a local hard dr…☆78Updated 5 months ago
- Airbnb scraper made in Python☆103Updated 2 weeks ago
- Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web scraping, image & webpage links extr…☆265Updated last month
- A Python package for converting PDFs to markdown while extracting images and tables, generate descriptive text descriptions for extracted…☆177Updated 5 months ago
- Effortlessly extract information from unstructured data with this library, utilizing advanced AI techniques. Compose AI in customizable p…☆87Updated last year
- Create a music review RAG application with Neo4j☆22Updated last year
- Audio to summary with openAI Whisper & GPT 3.5/4 using streamlit☆61Updated 2 years ago
- This repository contains a Python script that allows users to download the audio from a YouTube video, transcribe it into text, detect th…☆154Updated 8 months ago
- An LLM GUI application; enables you to interact with your files, offering dynamic parameters that can modify response behavior during run…☆95Updated last month
- Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.☆296Updated 8 months ago
- ScraperAI is an open-source, AI-powered tool designed to simplify web scraping for users of all skill levels.☆232Updated 3 months ago
- The Python toolkit for converting Reddit threads into organized text data. Extract and process Reddit content with ease!☆114Updated last year
- Open-Source RAG app with LLM Observability (Langfuse), support for 100+ providers (LiteLLM), Dockerized, Full Type-checking, 100% Test co…☆157Updated 2 weeks ago
- PDF text data extraction web app with OCR for scanned documents☆94Updated last year
- AI + Legal APIs: A Tool-Based Retrieval Augmented Generation Workbench for Legal AI UX Research.☆119Updated last year
- An advanced retrieval system that combines semantic vector search with token-based search, using contextual chunking and knowledge graphs…☆45Updated last year
- PDFstract - A Conversion and OCR benchmarking solution - Soon to be a Unified Pipeline for PDF data extraction. CLI and GUI☆56Updated 3 weeks ago
- Chat with your Documents(PDF, TXT, DOCX, ODT, PPTX etc), Websites and Youtube Chat too!, CSV files. Uses langchain, Ollama, Groq, Gemini,…☆55Updated last year
- Completely local RAG. Chat with your PDF documents (with open LLM) and UI to that uses LangChain, Streamlit, Ollama (Llama 3.1), Qdrant a…☆121Updated last year
- Automatically generate engaging AI podcasts from nothing but an episode title.☆140Updated 5 months ago
- Web scraping framework built for AI applications. Extract clean, structured content from any website with dynamic content handling, markd…☆50Updated 3 months ago