kdr / videoRAG-mrr2024
Supporting code for: Video Enriched Retrieval Augmented Generation Using Aligned Video Captions
☆22Updated 7 months ago
Alternatives and similar repositories for videoRAG-mrr2024:
Users that are interested in videoRAG-mrr2024 are comparing it to the libraries listed below
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆31Updated 3 months ago
- Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"☆29Updated 4 months ago
- Visual RAG using less than 300 lines of code.☆25Updated 11 months ago
- ☆41Updated last year
- Official Pytorch Implementation of Self-emerging Token Labeling☆32Updated 10 months ago
- Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).☆19Updated 6 months ago
- Towards a rotationally invariant convolutional layer☆10Updated 6 years ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated last week
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 8 months ago
- Using Gradio interface to build UI for converting text to speech☆12Updated 4 years ago
- arXiv 23 "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs"☆14Updated 2 months ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- ☆41Updated 8 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆24Updated last week
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆41Updated 6 months ago
- ViT trained on COYO-Labeled-300M dataset☆31Updated 2 years ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated 10 months ago
- ☆43Updated 4 months ago
- ☆11Updated 2 years ago
- ☆12Updated 5 months ago
- Video-LlaVA fine-tune for CinePile evaluation☆47Updated 6 months ago
- Multi-Modal Language Modeling with Image, Audio and Text Integration, included multi-images and multi-audio in a single multiturn.☆17Updated last year
- Code for AAAI 2023 Paper : “Alignment-Enriched Tuning for Patch-Level Pre-trained Document Image Models”☆17Updated 2 years ago
- SAM-CLIP module for use with Autodistill.☆13Updated last year
- Graph learning framework for long-term video understanding☆59Updated 2 weeks ago
- Implementation of the DocLLM paper for Llama models.☆12Updated 2 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 7 months ago
- Load any clip model with a standardized interface☆21Updated 9 months ago