wjbmattingly / dots.ocrLinks
Multilingual Document Layout Parsing in a Single Vision-Language Model
☆56Updated 6 months ago
Alternatives and similar repositories for dots.ocr
Users that are interested in dots.ocr are comparing it to the libraries listed below
Sorting:
- LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for cloud deployment.☆122Updated last year
- ☆114Updated last year
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆352Updated 8 months ago
- ☆185Updated 2 weeks ago
- Inference, Fine Tuning and many more recipes with Gemma family of models☆279Updated 6 months ago
- A CLI to estimate inference memory requirements for Hugging Face models, written in Python.☆683Updated last week
- OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR☆171Updated last week
- Inference and fine-tuning examples for vision models from 🤗 Transformers☆165Updated 6 months ago
- Which model is the best at object detection? Which is best for small or large objects? We compare the results in a handy leaderboard.☆99Updated this week
- Simple UI for debugging correlations of text embeddings☆305Updated 8 months ago
- Fine tune Gemma 3 on an object detection task☆97Updated 6 months ago
- Docling core data types and transformations☆225Updated this week
- Recognition of handwritten text using CRAFT text detection and TrOCR☆26Updated 3 years ago
- lightweight, python based chat ui☆342Updated 2 months ago
- ☆198Updated 6 months ago
- Code examples showing how to use Gemini, Gemma, Imagen, and more.☆50Updated 3 weeks ago
- Checkbox Detection Model for Scanned Documents☆91Updated 11 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆238Updated this week
- Join 15k builders to the Real-World ML Newsletter ⬇️⬇️⬇️☆47Updated last year
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆77Updated 6 months ago
- Extract structured text from pdfs quickly☆661Updated 8 months ago
- TF-ID: Table/Figure IDentifier for academic papers☆245Updated last year
- ☆188Updated 6 months ago
- YOLOv10 trained on DocLayNet dataset.☆80Updated last year
- Luth is a state-of-the-art series of fine-tuned LLMs for French☆41Updated 4 months ago
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆85Updated last year
- ☆212Updated 8 months ago
- A complete PyTorch implementation of Google's Gemma3 270M language model, featuring sliding window attention, RoPE positional encoding, a…☆44Updated 5 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆843Updated last year
- Take your LLM to the optometrist.☆46Updated last week