[CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding
☆36Jul 22, 2025Updated 7 months ago
Alternatives and similar repositories for Docopilot
Users that are interested in Docopilot are comparing it to the libraries listed below
Sorting:
- ☆21Nov 17, 2025Updated 4 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆91Nov 15, 2024Updated last year
- ☆11Oct 31, 2024Updated last year
- Video Benchmark Suite: Rapid Evaluation of Video Foundation Models☆16Jan 10, 2025Updated last year
- ☆14Jan 26, 2025Updated last year
- Code and Data for "FaithfulRAG: Fact-Level Conflict Modeling for Context-Faithful Retrieval-Augmented Generation" (ACL25)☆29Oct 26, 2025Updated 4 months ago
- PyTorch implementation of the article "Generative Adversarial Network for Handwritten Text"☆10Nov 13, 2023Updated 2 years ago
- This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy☆50Oct 16, 2024Updated last year
- The official repository of MM-R5☆29Jun 22, 2025Updated 8 months ago
- [ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.☆14Mar 2, 2024Updated 2 years ago
- ☆61Aug 5, 2025Updated 7 months ago
- code for A Large-scale Dataset for Audio-Language Representation Learning☆14Sep 18, 2024Updated last year
- Official release of Genos models.☆22Jan 30, 2026Updated last month
- ☆24Feb 4, 2026Updated last month
- 李宏毅机器学习2021笔记☆14Nov 27, 2022Updated 3 years ago
- ☆11Apr 26, 2019Updated 6 years ago
- Large-scale text embedding model☆38Sep 6, 2025Updated 6 months ago
- serverless vscode webide☆17Apr 14, 2023Updated 2 years ago
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆18Jun 19, 2025Updated 9 months ago
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- Local DeepSearch (Advantage: Low Threshold): an implementation of Agentic RAG based on DeepSeek-R1 API and Tavily API☆17Jun 21, 2025Updated 8 months ago
- ☆19May 19, 2024Updated last year
- Handwriting Trajectory Recovery using End-to-End Deep Encoder-Decoder Network, ICPR 2018.☆15Jul 17, 2019Updated 6 years ago
- 同济大学简历模版,做了一点点本地化修改 (generated from fky2015/resume-ng)☆14Dec 3, 2023Updated 2 years ago
- ☆22Sep 16, 2025Updated 6 months ago
- Cross-modal Reinforced Prompting for Graph and Language Tasks, KDD 2024.☆11Sep 29, 2024Updated last year
- Just prepare config file and start training your metric learning model with ease☆16Apr 2, 2024Updated last year
- FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models☆12Dec 21, 2025Updated 2 months ago
- The official repo for the DanQing dataset.☆31Jan 16, 2026Updated 2 months ago
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams☆47Updated this week
- Code from the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models"☆127Jan 6, 2026Updated 2 months ago
- [ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]☆15Jul 15, 2025Updated 8 months ago
- [🎖️1등(장관상) 솔루션] 2022 국립국어원 인공 지능 언어 능력 평가 (쇼핑몰 리뷰 데이터 속성 기반 감성 분석 : Aspect-Based Sentiment Analysis)☆11Jun 6, 2023Updated 2 years ago
- ☆18Mar 19, 2023Updated 3 years ago
- EMNLP 2024 | Style-Specific Neurons for Steering LLMs in Text Style Transfer☆13Mar 23, 2025Updated 11 months ago
- ☆38Jan 9, 2026Updated 2 months ago
- Data preparation code for building Kaldi ASR system☆14Mar 18, 2017Updated 9 years ago
- [NAACL 2024] TabSQLify: Enhancing Reasoning Capabilities of LLMs Through Table Decomposition☆17Jan 5, 2026Updated 2 months ago
- ICCV 2025: Official Implematation of "Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced L…☆64Oct 25, 2025Updated 4 months ago