kdr / videoRAG-mrr2024
Supporting code for: Video Enriched Retrieval Augmented Generation Using Aligned Video Captions
☆26Updated 9 months ago
Alternatives and similar repositories for videoRAG-mrr2024
Users that are interested in videoRAG-mrr2024 are comparing it to the libraries listed below
Sorting:
- ☆43Updated 3 weeks ago
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆37Updated 6 months ago
- ☆14Updated last month
- Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆15Updated last month
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 2 months ago
- Visual RAG using less than 300 lines of code.☆27Updated last year
- ☆20Updated last year
- Tools for merging pretrained large language models.☆19Updated 11 months ago
- Repo of FocusedAD☆12Updated last month
- ScrollNet for Continual Learning☆11Updated last year
- ☆57Updated 5 months ago
- Visualize multi-model embedding spaces. The first goal is to quickly get a lay of the land of any embedding space. Then be able to scroll…☆27Updated last year
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated 6 months ago
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆21Updated 9 months ago
- SMILE: A Multimodal Dataset for Understanding Laughter☆14Updated last year
- (WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…☆84Updated 3 months ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆14Updated 5 months ago
- Verifiers for LLM Reinforcement Learning☆50Updated last month
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆58Updated 7 months ago
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated 9 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆21Updated 3 weeks ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated 3 weeks ago
- Load any clip model with a standardized interface☆21Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆21Updated last month
- ☆27Updated last month
- ☆24Updated last year
- ☆88Updated last year
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆18Updated 4 months ago
- Using Gradio interface to build UI for converting text to speech☆13Updated 4 years ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆34Updated 4 months ago