huggingface / fineVideoLinks
☆75Updated 8 months ago
Alternatives and similar repositories for fineVideo
Users that are interested in fineVideo are comparing it to the libraries listed below
Sorting:
- ☆70Updated last month
- Official PyTorch implementation of TokenSet.☆121Updated 2 months ago
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated 9 months ago
- Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"☆173Updated 11 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆75Updated 5 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆52Updated 6 months ago
- LL3M: Large Language and Multi-Modal Model in Jax☆72Updated last year
- ☆77Updated 8 months ago
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆207Updated this week
- Collection of scripts to build small-scale datasets for fine-tuning video generation models.☆58Updated 2 months ago
- Matryoshka Multimodal Models☆107Updated 4 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆203Updated 5 months ago
- ☆62Updated 10 months ago
- ☆63Updated 8 months ago
- 🦾 EvalGIM (pronounced as "EvalGym") is an evaluation library for generative image models. It enables easy-to-use, reproducible automatic…☆80Updated 5 months ago
- Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!☆110Updated 3 months ago
- Multimodal language model benchmark, featuring challenging examples☆168Updated 5 months ago
- ☆64Updated last year
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆34Updated 11 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆66Updated last month
- Implementation of the proposed MaskBit from Bytedance AI☆80Updated 6 months ago
- M4 experiment logbook☆57Updated last year
- Focused on fast experimentation and simplicity☆73Updated 5 months ago
- Official implementation of "Art-Free Generative Models: Art Creation Without Graphic Art Knowledge"☆31Updated last month
- [ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆132Updated 11 months ago
- An open source implementation of CLIP (With TULIP Support)☆147Updated 3 weeks ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆132Updated 4 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆156Updated last month
- Just another reasonably minimal repo for class-conditional training of pixel-space diffusion transformers.☆88Updated last week
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆72Updated this week