huggingface / fineVideo
☆52Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for fineVideo
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 3 weeks ago
- Video-LlaVA fine-tune for CinePile evaluation☆38Updated 3 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆30Updated last month
- Implementation of the premier Text to Video model from OpenAI☆57Updated last week
- a family of highly capabale yet efficient large multimodal models☆166Updated 2 months ago
- Multimodal language model benchmark, featuring challenging examples☆149Updated 3 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆169Updated 3 weeks ago
- ☆62Updated last month
- ☆278Updated 2 weeks ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆38Updated last month
- Data release for the ImageInWords (IIW) paper.☆200Updated this week
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆77Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆179Updated last month
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆91Updated 2 weeks ago
- E5-V: Universal Embeddings with Multimodal Large Language Models☆173Updated 4 months ago
- Video+code lecture on building nanoGPT from scratch☆64Updated 5 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆174Updated this week
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆212Updated 3 months ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated last week
- SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models☆173Updated 2 months ago
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆142Updated last week
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 2 months ago
- PyTorch implementation of models from the Zamba2 series.☆158Updated this week
- This repo contains the code and data for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks"☆69Updated last week
- VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024☆261Updated 7 months ago
- Cerule - A Tiny Mighty Vision Model☆67Updated 2 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- LLM2CLIP makes SOTA pretrained CLIP model more SOTA ever.☆242Updated this week
- Matryoshka Multimodal Models☆82Updated this week