google-deepmind / videoprismLinks
Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024)
☆64Updated this week
Alternatives and similar repositories for videoprism
Users that are interested in videoprism are comparing it to the libraries listed below
Sorting:
- A streamlined implementation of Grounding DINO and SAM for advanced image segmentation. This lightweight solution simplifies the integrat…☆64Updated 8 months ago
- The official GitHub Page for MiniMax☆45Updated 3 weeks ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 4 months ago
- ☆13Updated 6 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆52Updated 7 months ago
- Official PyTorch implementation of TokenSet.☆121Updated 3 months ago
- Incredibly descriptive audiovisual summaries for videos☆41Updated 10 months ago
- faster parallel inference of mochi-1 video generation model☆121Updated 4 months ago
- ☆29Updated last year
- Testing and evaluating the capabilities of Vision-Language models (PaliGemma) in performing computer vision tasks such as object detectio…☆81Updated last year
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆69Updated last year
- ☆16Updated 3 months ago
- Visual RAG using less than 300 lines of code.☆28Updated last year
- Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent, Audio, Image, Video, Music and 3D…☆36Updated 4 months ago
- A minimalistic, hackable code base to finetune Wan video generation model☆40Updated 2 months ago
- A one-stop library to standardize the inference and evaluation of all the conditional video generation models.☆48Updated 4 months ago
- ☆78Updated 8 months ago
- Community ComfyUI workflows running on fal.ai☆57Updated 9 months ago
- ☆56Updated 7 months ago
- [arXiv] On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices☆116Updated 4 months ago
- Implementation of Mind Evolution, Evolving Deeper LLM Thinking, from Deepmind☆53Updated 3 weeks ago
- ☆70Updated 8 months ago
- ☆13Updated last year
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆131Updated 7 months ago
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated 10 months ago
- Fine-tune of Florence-2 for shot categorization.☆24Updated 3 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆36Updated last year
- 🔥 Official impl. of "DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction"☆111Updated last week
- Implementation of the premier Text to Video model from OpenAI☆57Updated 7 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year