mybearyZhang / TwoStageReason
Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning
☆11Updated 4 months ago
Alternatives and similar repositories for TwoStageReason:
Users that are interested in TwoStageReason are comparing it to the libraries listed below
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆24Updated 7 months ago
- Official implementation of Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement.☆25Updated 5 months ago
- Code for our ICML'24 on multimodal dataset distillation☆35Updated 4 months ago
- Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)☆26Updated 3 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆81Updated 5 months ago
- ☆28Updated 7 months ago
- [AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)☆39Updated 10 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated 10 months ago
- ☆100Updated this week
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆27Updated 3 months ago
- [AAAI 24] Official Codebase for BridgeQA: Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA☆19Updated 7 months ago
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆14Updated last month
- Accepted by CVPR 2024☆31Updated 9 months ago
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆35Updated last month
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆31Updated 11 months ago
- ☆65Updated 2 months ago
- ☆15Updated 8 months ago
- [NeurIPS 2023] Generalized Logit Adjustment☆34Updated 9 months ago
- [ICML 2024] SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning☆27Updated 4 months ago
- [ICLR 2024 Poster] SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos☆17Updated 2 months ago
- VisualGPTScore for visio-linguistic reasoning☆26Updated last year
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆33Updated 3 months ago
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆62Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆67Updated 8 months ago
- Official implementation of the CVPR'24 paper [Adaptive Slot Attention: Object Discovery with Dynamic Slot Number]☆32Updated 3 weeks ago
- ☆14Updated 3 months ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆21Updated 5 months ago
- The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?☆24Updated 3 months ago