hanghuacs/FineCaption

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/hanghuacs/FineCaption)

hanghuacs / FineCaption

☆39

Alternatives and similar repositories for FineCaption

Users that are interested in FineCaption are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hanghuacs / MMComposition
View on GitHub
☆17Jun 20, 2025Updated last year
yunlong10 / CAT-V
View on GitHub
[AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…
☆68Jan 27, 2026Updated 5 months ago
yeates / Aurora
View on GitHub
Aurora: Unified Video Editing with a Tool-Using Agent
☆59Jun 16, 2026Updated last month
yunlong10 / Video-R4
View on GitHub
Reinforcing Text-Rich Video Reasoning with Visual Rumination
☆28Jun 5, 2026Updated last month
heliossun / LaCoT
View on GitHub
[NeurIPS 2025] Official code for paper: Latent Chain-of-Thought for Visual Reasoning
☆36Oct 16, 2025Updated 9 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Jiaxuan-Li / EVCap
View on GitHub
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆64Apr 8, 2024Updated 2 years ago
davidserra9 / abair
View on GitHub
[CVIU'26] Adaptive Blind All-in-One Image Restoration
☆35Mar 17, 2025Updated last year
Tanveer81 / RGNet
View on GitHub
This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos
☆20Mar 3, 2025Updated last year
yunlong10 / Awesome-Video-LMM-Post-Training
View on GitHub
🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training
☆296Mar 3, 2026Updated 4 months ago
haoxiangzhao12138 / REIR
View on GitHub
[ACMMM'25] Referring Expression Instance Retrieval and A Strong End-to-End Baseline
☆19Apr 7, 2026Updated 3 months ago
sterzhang / image-textualization
View on GitHub
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆172Jul 30, 2024Updated last year
amazon-science / camml
View on GitHub
CaMML:Context-Aware MultiModal Learner for Large Models (ACL 2024 SAC Award)
☆15May 21, 2025Updated last year
HITsz-TMG / Cognitive-Visual-Language-Mapper
View on GitHub
The codes and datasets about our ACL 2024 Main Conference paper titled "Cognitive Visual-Language Mapper: Advancing Multimodal Comprehens…
☆17Jan 24, 2025Updated last year
hrtang22 / MUSE
View on GitHub
Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"
☆26Feb 2, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
AICyberTeam / AIR-CD
View on GitHub
We build a challenging cloud detection dataset called AIR-CD, with higher spatial resolution and more representative landcover types.
☆14Jan 28, 2021Updated 5 years ago
Job-Bench / job-bench-eval
View on GitHub
Official eval scripts for JobBench
☆29Jul 18, 2026Updated last week
Mozhgan91 / LEO
View on GitHub
LEO: A powerful Hybrid Multimodal LLM
☆20Jan 18, 2025Updated last year
Dmmm1997 / C3VG
View on GitHub
[AAAI2025 selected as oral] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
☆45Jul 2, 2025Updated last year
WSH032 / image-deduplicate-cluster-webui
View on GitHub
A WebUI script that deduplicates images or clusters them by tags or WD14. 一个用于图像查重和基于tags或者WD14提取的特征进行聚类的WebUI脚本
☆12Aug 8, 2023Updated 2 years ago
Ivy-zoe / script
View on GitHub
same script
☆12Nov 25, 2019Updated 6 years ago
gaostar123 / DeViL
View on GitHub
[ACM MM 2026] Detector-Empowered Video Large Language Model for Efficient Spatio-Temporal Grounding
☆27Jul 12, 2026Updated 2 weeks ago
chandrasekaraditya / ReMOVE
View on GitHub
Official Implementation for "ReMOVE: A Reference-free Metric for Object Erasure"
☆25Apr 30, 2024Updated 2 years ago
Birch-san / regional-attn
View on GitHub
☆19Aug 19, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ChocoWu / SeTok
View on GitHub
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆81Apr 19, 2025Updated last year
twowwj / UVMap-ID
View on GitHub
[ACMMM24] UVMap-ID: A Controllable and Personalized UV Map Generative Model
☆22Aug 9, 2024Updated last year
2404589803 / hf-daily-paper-newsletter-multilingual
View on GitHub
🤖 A multilingual translation tool that automatically converts Hugging Face's daily AI research papers into 🇯🇵 Japanese, 🇰🇷 Korean, �…
☆18Updated this week
JiazuoYu / Fines
View on GitHub
Code for paper "FineRS: Fine-grained Reasoning and Segmentation of Small Objects with Reinforcement Learning" Neurips2025.
☆15Jan 29, 2026Updated 5 months ago
qwang666 / RoomTex-
View on GitHub
[ECCV24] Official code for RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
☆32Sep 3, 2024Updated last year
PRIS-CV / CineTechBench
View on GitHub
A Benchmark for Cinematographic Technique Understanding and Generation
☆29Sep 19, 2025Updated 10 months ago
DataCTE / SDXL-Training-Improvements
View on GitHub
📊 Research-focused SDXL training framework exploring novel optimization approaches. Goals include enhanced image quality, training stabi…
☆21Jun 7, 2025Updated last year
facebookresearch / PartDistillation
View on GitHub
Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"
☆60Dec 17, 2023Updated 2 years ago
shuheikurita / RefEgo
View on GitHub
☆13Jul 20, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
lntzm / MESM
View on GitHub
The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)
☆32Mar 29, 2024Updated 2 years ago
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
HaiyanHuang98 / NWPU-Captions
View on GitHub
☆18Dec 7, 2022Updated 3 years ago
rmcong / SDDNet_ACMMM23
View on GitHub
☆20Nov 22, 2023Updated 2 years ago
BakeLab / Visual-Aesthetic-Benchmark
View on GitHub
☆32May 15, 2026Updated 2 months ago
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
penghao-wu / vstar
View on GitHub
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
☆707Jan 7, 2024Updated 2 years ago