HKUDS / VideoRAG
"VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos"
☆510Updated this week
Alternatives and similar repositories for VideoRAG:
Users that are interested in VideoRAG are comparing it to the libraries listed below
- "Your Fully-Automated Personal AI Assistant, and Open-Source & Cost-Efficient Alternative to OpenAI's Deep Research"☆838Updated last month
- "GraphAgent: Agentic Graph Language Assistant"☆292Updated last month
- Build multimodal language agents for fast prototype and production☆2,451Updated last week
- "MiniRAG: Making RAG Simpler with Small and Free Language Models"☆901Updated this week
- Ola: Pushing the Frontiers of Omni-Modal Language Model☆320Updated last month
- Pioneering Multimodal Reasoning with CoT☆1,157Updated this week
- [NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions☆1,050Updated 5 months ago
- Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer☆1,155Updated 2 weeks ago
- [AAAI 2025] Official repository of Imitate Before Detect: Aligning Machine Stylistic Preference for Machine-Revised Text Detection☆194Updated last month
- DeepRetrieval - Hacking 🔥Real Search Engines and Text/Data Retrievers with LLM + RL☆201Updated this week
- Awesome-GraphRAG: A curated list of resources (surveys, papers, benchmarks, and opensource projects) on graph-based retrieval-augmented g…☆900Updated last week
- Align Anything: Training All-modality Model with Feedback☆3,063Updated last week
- Resources of our paper "FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces". New versions in the maki…☆938Updated last week
- ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents☆412Updated last week
- Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models☆908Updated 2 weeks ago
- 🌐 WebWalker: Benchmarking LLMs in Web Traversal☆378Updated 2 weeks ago
- Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。☆1,689Updated 2 months ago
- Open-sourced, Fast and Context-aware Action Grounding from GUI Instructions for GUI/Computer-use Agents☆343Updated last month
- The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"☆705Updated 2 months ago
- Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement 🔥☆555Updated 2 months ago
- ☆223Updated 3 months ago
- Parsing-free RAG supported by VLMs☆644Updated last month
- Video generation from text&image, 1st-gen☆833Updated last month
- In-depth study of the graphrag☆674Updated this week
- ☆766Updated last week
- ☆118Updated last month
- Next-Generation Interactive Intelligent Programming Assistant☆805Updated 5 months ago
- Medical o1, Towards medical complex reasoning with LLMs☆1,021Updated 2 months ago
- An Innovative Agent Framework Driven by KG Engine☆755Updated 2 months ago
- minimal-cost for training 0.5B R1-Zero☆673Updated this week