Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)
☆12Mar 6, 2025Updated 11 months ago
Alternatives and similar repositories for DeCapBench
Users that are interested in DeCapBench are comparing it to the libraries listed below
Sorting:
- An Arena-style Automated Evaluation Benchmark for Detailed Captioning☆57Jun 1, 2025Updated 8 months ago
- Dataset and codes for our paper "New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Cat…☆14Dec 14, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆12Nov 14, 2025Updated 3 months ago
- Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation☆12Feb 16, 2025Updated last year
- DL Backtrace is a new explainablity technique for deep learning models that works for any modality and model type.☆23Feb 16, 2026Updated last week
- Official implementation of Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information☆11Sep 28, 2023Updated 2 years ago
- Machine Reading Comprehension has attracted significant interest in research on natural language understanding, and large-scale datasets …☆10Aug 14, 2021Updated 4 years ago
- ☆23Jun 19, 2025Updated 8 months ago
- ☆22Dec 23, 2025Updated 2 months ago
- ☆22Dec 11, 2025Updated 2 months ago
- The implementation codes of paper: Multimodal Sentiment Analysis with Mutual Information-based Disentangled Representation Learning☆18May 8, 2025Updated 9 months ago
- [IEEE TIP] Offical implementation for the work "BadCM: Invisible Backdoor Attack against Cross-Modal Learning".☆14Aug 30, 2024Updated last year
- Guide for the slp group on how to use the Grnet cluster☆11Apr 16, 2020Updated 5 years ago
- Official implementation of REArtGS (NeurIPS 2025)☆19Oct 24, 2025Updated 4 months ago
- Image Text Segmentation using FAST corner detection and DBSCAN clustering with k-d tree data structure☆14Feb 27, 2019Updated 7 years ago
- [AAAI 2026] Official Code for VQAThinker: Exploring Generalizable and Explainable Video Quality Assessment via Reinforcement Learning☆19Nov 28, 2025Updated 3 months ago
- ☆13May 17, 2025Updated 9 months ago
- ☆16Aug 15, 2024Updated last year
- An implementation of several unsupervised object discovery models (Slot Attention, SLATE, GNM) in PyTorch with pre-trained models.☆14May 26, 2025Updated 9 months ago
- ☆12Apr 19, 2024Updated last year
- Crossmodal Translation based Meta Weight Adaption for Robust Image-Text Sentiment Analysis☆15May 16, 2024Updated last year
- Remote sensing labwork☆12Feb 27, 2018Updated 8 years ago
- This is the repo for "Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition", CVPR2025.☆20Dec 22, 2025Updated 2 months ago
- Official training code for MUG-V 10B video generation model. Built on Megatron-LM (v0.14.0) with production-ready distributed training fo…☆19Oct 20, 2025Updated 4 months ago
- Official repository for ACM Multimedia'24 paper "MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube a…☆18Aug 11, 2024Updated last year
- Search, download Vimeo videos and retrieve metadata in Go.☆11Feb 10, 2022Updated 4 years ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆61Apr 8, 2024Updated last year
- Homepage☆12Dec 20, 2025Updated 2 months ago
- ☆11Jan 8, 2025Updated last year
- ☆13Jun 5, 2024Updated last year
- Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving☆16Feb 10, 2025Updated last year
- [ICLR 2023] PyTorch code for DFPC: Data flow driven pruning of coupled channels without data.☆15Aug 25, 2023Updated 2 years ago
- [EMNLP'2024 Findings] Explore generated documents for enhanced IR with LLMs. We enhance BM25 to surpass strong dense retriever on many da…☆15Mar 28, 2025Updated 11 months ago
- [ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".☆22Oct 28, 2025Updated 4 months ago
- AI Router☆14Aug 1, 2024Updated last year
- Official Repository for "Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality" (ECCV 2024)☆16Oct 29, 2024Updated last year
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆34Jul 3, 2025Updated 7 months ago
- Lightweight piece tokenization library☆12Apr 15, 2024Updated last year
- Expose a server running on your local machine to the internet, like Ngrok, based on Netty☆14Jun 1, 2021Updated 4 years ago