Official Repository of OmniCaptioner
☆169Apr 23, 2025Updated 10 months ago
Alternatives and similar repositories for OmniCaptioner
Users that are interested in OmniCaptioner are comparing it to the libraries listed below
Sorting:
- InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery☆1,015Feb 14, 2026Updated 2 weeks ago
- Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs☆45Jun 17, 2025Updated 8 months ago
- ☆27Mar 3, 2025Updated last year
- [T-PAMI 2024] & [CVPR 2023] Vote2Cap-DETR; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning met…☆104Aug 17, 2024Updated last year
- Official repository for "TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving"☆23Sep 1, 2025Updated 6 months ago
- [Neural Networks 2025] The official code for the paper "MNet: A Multi-Scale Network for Visible Watermark Removal."☆17Jun 16, 2025Updated 8 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- (ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback☆38Jun 24, 2025Updated 8 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Jan 29, 2025Updated last year
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆103Nov 9, 2024Updated last year
- [CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding☆24Mar 24, 2025Updated 11 months ago
- ☆13Feb 2, 2025Updated last year
- [ICLR 2026] Official Implementation of ProxyThinker: Test-Time Guidance through Small Visual Reasoners.☆20Sep 24, 2025Updated 5 months ago
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research☆22Sep 23, 2025Updated 5 months ago
- ☆11Nov 12, 2018Updated 7 years ago
- ☆16Sep 4, 2025Updated 6 months ago
- ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025☆13Aug 25, 2025Updated 6 months ago
- This is the official implementation of our paper: “VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame In…☆16Dec 5, 2025Updated 2 months ago
- This is the official code repository for the paper: Towards General Continuous Memory for Vision-Language Models.☆20Jul 3, 2025Updated 8 months ago
- Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"☆123Oct 2, 2025Updated 5 months ago
- Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision☆11Jul 22, 2024Updated last year
- Implementation of the paper Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs☆12Jun 7, 2025Updated 8 months ago
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated last month
- The implementation for FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts (ICCV25)☆14Jun 26, 2025Updated 8 months ago
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆28Aug 19, 2024Updated last year
- ☆46Dec 30, 2024Updated last year
- Official codes for "Q-Ground: Image Quality Grounding with Large Multi-modality Models", ACM MM2024 (Oral)☆44Oct 25, 2024Updated last year
- [ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation☆33Aug 18, 2025Updated 6 months ago
- Powerful Python-based tool for scraping Tweets, user data, and trends from Twitter without needing API access or authentication, offering…☆130Jan 4, 2025Updated last year
- (ACL-2025 main conference) SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automat…☆322Aug 27, 2025Updated 6 months ago
- [EMNLP 2024] SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information☆12Oct 11, 2024Updated last year
- [CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆65Updated this week
- 数字底座是一款面向大型政府、企业数字化转型,基于身份认证、组织架构、岗位职务、应用系统、资源角色、数据目录、安全控制等功能构建的统一且安全的管理支撑平台。数字底座基于三员管理模式,具备微服务、多租户、容器化和国产化,支持用户利用代码生成器快速构建自己的业务应用,同时可关联诸…☆2,574Updated this week
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆138Aug 21, 2025Updated 6 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆66Jun 10, 2025Updated 8 months ago
- [CVPR'24] Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression☆15Jul 1, 2024Updated last year
- Comprehensive AI-powered urban development optimization platform that combines deep learning and reinforcement learning for data-driven b…☆35Nov 26, 2025Updated 3 months ago
- A SAR domain-specific language defined in CXX & Python. Keywords: AST, MLIR, LLVM, FPGA HLS. Currently under development...☆17Dec 3, 2025Updated 3 months ago