huridocs / pdf_paragraphs_extraction
☆49Updated 9 months ago
Alternatives and similar repositories for pdf_paragraphs_extraction:
Users that are interested in pdf_paragraphs_extraction are comparing it to the libraries listed below
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆73Updated 3 weeks ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆42Updated 9 months ago
- ☆22Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated 6 months ago
- Fine-Tuning LLM and embedding models☆27Updated last year
- I have explained how to create superior RAG pipeline for complex pdfs using LlamaParse. We can extract text and tables from pdf and QA on…☆44Updated last year
- python package to parse pdfs with different parsers☆35Updated 4 months ago
- ☆37Updated last week
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆82Updated 3 months ago
- Parsee's PDF reader, specialized on the extraction of tables with numeric values and the accurate extraction and preservation of text-par…☆58Updated 2 months ago
- A new novel multi-modality (Vision) RAG architecture☆25Updated 6 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆125Updated last year
- ☆177Updated last week
- TextEmbed is a REST API crafted for high-throughput and low-latency embedding inference. It accommodates a wide variety of embedding mode…☆23Updated 7 months ago
- My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"☆98Updated last year
- Code implement reposity of Paper HiQA☆100Updated last month
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆48Updated 2 months ago
- ☆51Updated 9 months ago
- ☆60Updated last year
- multimodal document analysis☆164Updated 10 months ago
- ☆26Updated 6 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆64Updated 4 months ago
- Evaluation for AI apps and agent☆40Updated last year
- ☆74Updated last year
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆225Updated 7 months ago
- Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset☆27Updated 2 years ago
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated last year
- ☆38Updated last year
- Benchmark baseline for retrieval qa applications☆109Updated last year
- ☆19Updated last year