kdr / videoRAG-mrr2024Links

Supporting code for: Video Enriched Retrieval Augmented Generation Using Aligned Video Captions

☆27

Alternatives and similar repositories for videoRAG-mrr2024

Users that are interested in videoRAG-mrr2024 are comparing it to the libraries listed below

Sorting:

Christina200 / Online-LoRA-official
[WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…
☆46Updated 8 months ago
13331112522 / v-rag
Visual RAG using less than 300 lines of code.
☆28Updated last year
harrytea / TGDoc
"Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023
☆14Updated 7 months ago
IntelLabs / GraVi-T
Graph learning framework for long-term video understanding
☆65Updated last week
SHI-Labs / OLA-VLM
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆60Updated 4 months ago
PKU-YuanGroup / LLaVA-o1
☆56Updated 8 months ago
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆34Updated last year
MCR-PEFT / Ex-MCR
☆45Updated 2 months ago
XiaoduoAILab / XmodelVLM
☆68Updated last year
xinghaochen / SqueezeTime
Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"
☆33Updated last year
FactoDeepLearning / MultitaskVLFM
☆26Updated last year
Hao840 / ADEM-VL
PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"
☆20Updated 8 months ago
mbzuai-oryx / PALO
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆84Updated 5 months ago
Thorin215 / FocusedAD
Repo of FocusedAD
☆13Updated 3 months ago
mfarre / Video-LLaVA-7B-hf-CinePile
Video-LlaVA fine-tune for CinePile evaluation
☆51Updated 11 months ago
WeihuangLin / INF-LLaVA
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆42Updated 11 months ago
LHBuilder / SA-Segment-Anything
Vision-oriented multimodal AI
☆49Updated last year
callsys / TextVR
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆26Updated last year
nttmdlab-nlp / VDocRAG
[CVPR2025] VDocRAG: Retirval-Augmented Generation over Visually-Rich Documents
☆32Updated last month
SkalskiP / gradio_image_annotator
A component that allows you to annotate an image with points and boxes.
☆21Updated last year
TencentARC / ViSFT
☆34Updated last year
EternityYW / Gemini-Commonsense-Evaluation
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
☆36Updated last year
htqin / GoogleBard-VisUnderstand
How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges
☆30Updated last year
top-yun / SPARK
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
☆18Updated 6 months ago
vis-nlp / ChartGemma
☆63Updated last year
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Updated last year
ZechengLi19 / CIM
[IJCAI'23] Complete Instances Mining for Weakly Supervised Instance Segmentation
☆37Updated last year
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆27Updated 2 months ago
roboflow / cvevals
Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…
☆36Updated last year
ml-jku / semantic-image-text-alignment
☆24Updated 2 years ago