huggingface / fineVideo
☆52Updated last month
Related projects ⓘ
Alternatives and complementary repositories for fineVideo
- Recaption large (Web)Datasets with vllm and save the artifacts.☆30Updated last month
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 3 weeks ago
- Video-LlaVA fine-tune for CinePile evaluation☆37Updated 3 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆37Updated 3 weeks ago
- ☆62Updated last month
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆77Updated last year
- Implementation of the premier Text to Video model from OpenAI☆57Updated this week
- Framework agnostic computer vision inference. Run 1000+ models by changing only one line of code. Supports models from transformers, timm…☆119Updated this week
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆89Updated last week
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆168Updated 2 weeks ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆161Updated 4 months ago
- ☆57Updated last month
- Pytorch implementation of MIMO, Controllable Character Video Synthesis with Spatial Decomposed Modeling, from Alibaba Intelligence Group☆126Updated last month
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆178Updated last month
- ☆259Updated last week
- E5-V: Universal Embeddings with Multimodal Large Language Models☆169Updated 3 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆98Updated last month
- PyTorch implementation of models from the Zamba2 series.☆158Updated this week
- ☆44Updated last month
- faster parallel inference of mochi video generation model☆53Updated this week
- WebLINX is a benchmark for building web navigation agents with conversational capabilities☆116Updated last month
- Data release for the ImageInWords (IIW) paper.☆200Updated 5 months ago
- Video+code lecture on building nanoGPT from scratch☆64Updated 5 months ago
- Multimodal language model benchmark, featuring challenging examples☆148Updated 3 months ago
- This repository includes the code to download the curated HuggingFace papers into a single markdown formatted file☆14Updated 3 months ago
- a family of highly capabale yet efficient large multimodal models☆161Updated 2 months ago
- Cerule - A Tiny Mighty Vision Model☆67Updated 2 months ago
- Implementation of the proposed MaskBit from Bytedance AI☆58Updated 3 weeks ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆133Updated last month
- LL3M: Large Language and Multi-Modal Model in Jax☆64Updated 6 months ago