clulab / pdf2txtLinks
Convert PDF files to TXT
☆36Updated 2 years ago
Alternatives and similar repositories for pdf2txt
Users that are interested in pdf2txt are comparing it to the libraries listed below
Sorting:
- multimodal document analysis☆166Updated 2 months ago
- Logical structure analysis for visually structured documents☆93Updated 3 years ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆137Updated 2 years ago
- An index of PDF-centric corpora☆161Updated 7 months ago
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction☆84Updated last year
- GLiNER model in a FastAPI microservice.☆47Updated last year
- ☆83Updated 3 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆74Updated last year
- ☆201Updated this week
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- Large-language Model Evaluation framework with Elo Leaderboard and A-B testing☆52Updated last year
- Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…☆34Updated 5 months ago
- ☆61Updated last year
- 🦦 weasel: A small and easy workflow system☆90Updated 2 months ago
- Python library to use Pleias-RAG models☆68Updated 9 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆72Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆104Updated 3 months ago
- A Python library to chunk/group your texts based on semantic similarity.☆103Updated last year
- Small python package to measure OCR quality and other related metrics.☆26Updated last year
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆23Updated last year
- Pretraining Efficiently on S2ORC!☆179Updated last year
- Efficient few-shot learning with cross-encoders.☆62Updated last year
- Guideline following Large Language Model for Information Extraction☆426Updated last year
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆82Updated last year
- Pre-train Static Word Embeddings☆94Updated 5 months ago
- Benchmark various LLM Structured Output frameworks: Instructor, Mirascope, Langchain, LlamaIndex, Fructose, Marvin, Outlines, etc on task…☆184Updated last year
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆58Updated last year
- This is the repo for the container that holds the models for the text2vec-transformers module☆60Updated 3 months ago
- Robust and fast topic models with sentence-transformers.☆89Updated last week
- ☆59Updated last year