snumprlab / isr-dpo
ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO
☆12Updated this week
Alternatives and similar repositories for isr-dpo:
Users that are interested in isr-dpo are comparing it to the libraries listed below
- Egocentric Video Understanding Dataset (EVUD)☆24Updated 6 months ago
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆30Updated 6 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆56Updated 3 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆32Updated 4 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆36Updated 8 months ago
- Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆16Updated last month
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆17Updated 3 months ago
- NegCLIP.☆29Updated last year
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆56Updated 3 months ago
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆49Updated 2 months ago
- VisualGPTScore for visio-linguistic reasoning☆26Updated last year
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval"☆16Updated 4 months ago
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆16Updated this week
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆24Updated 4 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆58Updated 7 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆32Updated 3 months ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆26Updated 2 months ago
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆36Updated last year
- ☆15Updated last month
- FreeVA: Offline MLLM as Training-Free Video Assistant☆54Updated 7 months ago
- Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)☆74Updated 5 months ago
- ☆22Updated 3 months ago
- Language Repository for Long Video Understanding☆31Updated 6 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆63Updated 6 months ago
- [AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)☆39Updated 9 months ago
- [WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"☆13Updated 3 months ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆41Updated 9 months ago
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆20Updated last week
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆42Updated 5 months ago
- Official PyTorch code of "Unlocking Video-LLM via Agent-of-Thoughts Distillation".☆13Updated last month