PRITHIVSAKTHIUR / OCR-ReportLab-NotebooksLinks
A dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B & more..) On T4 GPU - free tier
☆20Updated last month
Alternatives and similar repositories for OCR-ReportLab-Notebooks
Users that are interested in OCR-ReportLab-Notebooks are comparing it to the libraries listed below
Sorting:
- ☆45Updated this week
- 研究GOT-OCR-项目落地加速,不限语言☆61Updated 10 months ago
- GLM Series Edge Models☆149Updated 3 months ago
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆146Updated last year
- ☆99Updated 8 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.☆73Updated 2 months ago
- (ICCV 2025) OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation☆86Updated 2 months ago
- Cook up amazing multimodal AI applications effortlessly with MiniCPM-o☆153Updated last week
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆38Updated last year
- ☆93Updated last month
- A third-party component library based on Gradio. Integrates Ant Design, Ant Design X, and more advanced components to help you build appl…☆117Updated this week
- Building LLaMA 4 MoE from Scratch☆63Updated 5 months ago
- A pipeline parallel training script for LLMs.☆158Updated 4 months ago
- ☆28Updated 11 months ago
- Composition of Multimodal Language Models From Scratch☆15Updated last year
- Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth☆171Updated last week
- A new novel multi-modality (Vision) RAG architecture☆29Updated 11 months ago
- ☆102Updated last year
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆154Updated last year
- AnyModal is a Flexible Multimodal Language Model Framework for PyTorch☆103Updated 8 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆92Updated 4 months ago
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆85Updated 7 months ago
- A Unified Toolkit for Deep Learning-Based Table Extraction☆49Updated 9 months ago
- ComoRAG is a Retrieval-Augmented Generation (RAG) system for long documents and multi-document QA, information extraction, and knowledge …☆229Updated 2 weeks ago
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆110Updated 2 months ago
- A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.☆234Updated 4 months ago
- 阅读顺序、Layoutreader☆17Updated 4 months ago
- My implementation of Kosmos2.5 from the paper: "KOSMOS-2.5: A Multimodal Literate Model"☆73Updated this week
- PresentAgent: Multimodal Agent for Presentation Video Generation☆98Updated last month
- Our 2nd-gen LMM☆34Updated last year