☆86Apr 21, 2025Updated 10 months ago
Alternatives and similar repositories for Finedefics_ICLR2025
Users that are interested in Finedefics_ICLR2025 are comparing it to the libraries listed below
Sorting:
- ☆38Jan 12, 2026Updated last month
- [ICLR 2026] SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models☆74Jan 29, 2026Updated last month
- ☆11Jan 27, 2020Updated 6 years ago
- FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens☆17Sep 8, 2025Updated 6 months ago
- The official repo for the DanQing dataset.☆30Jan 16, 2026Updated last month
- Transactions on Multimedia (TMM25)☆19Apr 8, 2025Updated 11 months ago
- LMM solved catastrophic forgetting, AAAI2025☆46Apr 15, 2025Updated 10 months ago
- [EMNLP 2024] Official repository for paper "From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis"☆21Oct 15, 2024Updated last year
- ☆23Oct 11, 2024Updated last year
- A Simple Framework of Small-scale LMMs for Video Understanding☆109Jun 11, 2025Updated 8 months ago
- Official implementation of the paper "Bind-Your-Avatar: Multi-Talking-Character Video Generation with Dynamic 3D-mask-based Embedding Rou…☆34Sep 25, 2025Updated 5 months ago
- 👆Pytorch implementation of "Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion"☆33Jul 28, 2025Updated 7 months ago
- (CVPR2024 Highlight) Novel Class Discovery for Ultra-Fine-Grained Visual Categorization (UFG-NCD)☆23Jul 1, 2024Updated last year
- [CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations☆131Sep 1, 2025Updated 6 months ago
- ☆42Nov 27, 2025Updated 3 months ago
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆28Aug 15, 2025Updated 6 months ago
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆77May 23, 2025Updated 9 months ago
- FineCLIP: Self-distilled Region-based CLIP for Better Fine-grained Understanding (NIPS24)☆35Nov 12, 2025Updated 3 months ago
- A Data collector for self-driving using GTA5☆31Jul 24, 2017Updated 8 years ago
- ☆21Dec 14, 2025Updated 2 months ago
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆2,308Oct 29, 2025Updated 4 months ago
- ☆109Aug 14, 2025Updated 6 months ago
- Official GitHub repo for Learning Normal Flow Directly from Event Neighborhoods (ICCV2025). It is an easy-to-use API for event-based norm…☆19Oct 5, 2025Updated 5 months ago
- ☆17May 25, 2025Updated 9 months ago
- CVPR2025: Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning☆38Mar 21, 2025Updated 11 months ago
- Bambo is a new proxy framework. Compared with mainstream frameworks, it is more lightweight and flexible and can handle various load task…☆33Feb 10, 2025Updated last year
- [NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models☆75May 31, 2025Updated 9 months ago
- [CVPR 2025] T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆106Oct 25, 2025Updated 4 months ago
- [ICLR'24] Democratizing Fine-grained Visual Recognition with Large Language Models☆190Jul 15, 2024Updated last year
- ☆10May 20, 2021Updated 4 years ago
- Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation☆12Dec 5, 2025Updated 3 months ago
- This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV …☆24Dec 4, 2025Updated 3 months ago
- Towards Photorealistic 4D Scene Generation via Video Diffusion Models☆20Jun 12, 2024Updated last year
- Official repository for "LFR-GAN: Local Feature Refinement based Generative Adversarial Network for Text-to-Image Generation" (TOMM 2023)…☆11Mar 21, 2023Updated 2 years ago
- uniapp前端ai对话模板☆13Apr 9, 2025Updated 11 months ago
- The official PyTorch implementation of RefRef: A Synthetic Dataset and Benchmark for Reconstructing Refractive and Reflective Objects☆15Mar 2, 2026Updated last week
- SfMEdu System from Princeton for Dense 3D Reconstruction☆11Dec 11, 2019Updated 6 years ago
- JoyType: A Robust Design for Multilingual Visual Text Creation☆39Sep 21, 2025Updated 5 months ago
- 本项目是基于coze-studio项目进行的二次开发,遵循其Apache 2.0 协议许可证。主要修改并使用其工作流部分的代码,作为联通元景万悟智能体平台的工作流模块。☆28Feb 28, 2026Updated last week