TF-ID: Table/Figure IDentifier for academic papers
☆244Jul 12, 2024Updated last year
Alternatives and similar repositories for TF-ID
Users that are interested in TF-ID are comparing it to the libraries listed below
Sorting:
- ☆20Jan 27, 2024Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago
- Extract structured text from pdfs quickly☆670Jun 11, 2025Updated 8 months ago
- ☆73Jul 14, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Dec 4, 2025Updated 3 months ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,433Jan 3, 2025Updated last year
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,868May 17, 2025Updated 9 months ago
- tiny_fnc_engine is a minimal python library that provides a flexible engine for calling functions extracted from a LLM.☆38Sep 11, 2024Updated last year
- Detect and extract tables to markdown and csv☆754Jan 24, 2025Updated last year
- anything you want can be built with morph cloud☆27Oct 14, 2025Updated 4 months ago
- ☆17Jan 23, 2021Updated 5 years ago
- ☆50Mar 14, 2024Updated last year
- Huggingface deployment for FastHTML☆36Sep 13, 2024Updated last year
- ReadingBank: A Benchmark Dataset for Reading Order Detection☆117Aug 26, 2024Updated last year
- Fast, High-Fidelity LLM Decoding with Regex Constraints☆21Jul 26, 2024Updated last year
- ☆20Jan 3, 2024Updated 2 years ago
- Chinese Mathematical Formula Detection (MFD) Dataset 中文文档数学公式检测数据集☆34Dec 21, 2022Updated 3 years ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,370May 30, 2025Updated 9 months ago
- ☆67Mar 4, 2024Updated 2 years ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆59Jan 5, 2026Updated 2 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,602Dec 20, 2025Updated 2 months ago
- Efficient vector database for hundred millions of embeddings.☆212May 17, 2024Updated last year
- Math OCR model that outputs LaTeX and markdown☆1,113Jan 29, 2025Updated last year
- Explore 160+ notebook visual analytics tools in your browser!☆67Mar 29, 2024Updated last year
- ☆249Jan 22, 2023Updated 3 years ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,392Mar 1, 2026Updated last week
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆275Dec 6, 2025Updated 3 months ago
- A Language and Live Runtime for Styling and Labeling Typeset Math Formulas☆26Oct 29, 2023Updated 2 years ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆2,542Mar 1, 2026Updated last week
- ☆102Dec 23, 2024Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆104Oct 28, 2025Updated 4 months ago
- ☆94Jul 4, 2025Updated 8 months ago
- ☆15Apr 26, 2025Updated 10 months ago
- Seamless Voice Interactions with LLMs☆12Oct 28, 2023Updated 2 years ago
- High-performance tokenized language data-loader for Python C++ extension☆14Jul 22, 2024Updated last year
- DB-based Optical Chemical Structure Recognition☆12Sep 12, 2022Updated 3 years ago
- [Corca / OR] Solver for Multi-dimensional Multi-demand Quadratic Knapsack Problems☆12Mar 22, 2022Updated 3 years ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,935Updated this week
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welc…☆23Mar 4, 2024Updated 2 years ago